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FOREWORD 


Investigation  of  the  application  of  oompater  techniques  to  the  analysis 
of  the  syntactic  and  semantic  structure  of  natural  language  was  began  In 
February  i960  under  Air  Force  Contract  AF  30(602)-2l85.  The  project  began 
with  an  Investigation  of  the  structure  of  an  artificial  language  suitable  for 
storage  and  retrieval  of  Infomatlon  (see  First  Quarterly  Report,  Hay  1,  19^0), 
but  turned  Immediately  to  the  development  of  routines  for  the  analysis  of 
English  preparatory  to  any  translation  of  natural  language  materials  Into  an 
aztlflclal  language.  Since  It  was  deemed  an  essential  feature  of  any  artlfi^ 
clal  language  that  each  sentence  should  consist  of  a  single  subject  and  a 
single  predicate,  work  was  started  early  on  clause  boundary  marking  routines, 
work  Mhloh  has  oontlnosd  until  the  present;  many  successive  routines  have  been 
hand-tested,  and  several  programmed  (currently  for  the  IBM  709),  In  order  to 
test  such  routines  a  taped  corpus  was  essential;  for  this  purpose  the  book 
Planet  Earth  by  Karl  Stumpff  (Ann  Arbor,  1950)  was  completely  key-punohed  and 
taped.  For  further  work  on  scientific  ^igllsh,  an  additional  corpus  of  45 
articles  (five  each  for  the  fields  of  Mathematics,  Physics,  AstrpnoiiQr,  Agriculture, 
Qeology,  Chemistry,  Engineering,  Biology,  and  Medicine),  totaling  about  120,000 
words,  was  selected  by  standard  randomizing  procedures  under  the  direction  of 
Professor  Douglas  EUson  of  the  Indiana  University  Psychology  Department.  A 
list  of  these  articles  in  presented  in  Appendix  XVII.  Since  the  routines  have 
not  yet  been  adequately  checked  out  on  the  preliminary  corpus,  this  random 
corpus  has  not  yet  been  punched. 

Since  any  syntactical  routines  (including  clause  bracketing)  required 
detailed  fono-class  information  on  all  words  In  the  text,  a  complete  dictionary 
of  Planet  Earth  was  prepared  with  form-class  infomatlon  coded  for  each  entry, 
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As  the  routines  have  changed,  the  coding  requlrenentf  have  also  been  revised, 
and  we  are  not  prepared  to  state  that  our  present  codihg  is  even  yet  all  that 
is  needed. 

Since  two  of  the  requirements  of  an  artificial  language  was  originally 
judged  to  be  minimum  vocabulary  (elimination  of  syiotQnns  and  derivatives)  and 
automatic  coding  of  as  many  words  as  possible,  work  was  also  begun  early  on  a 
suffi JO- stripping  routine  which  would  automatically  convert  such  words  as 
"inclusion"  into  "include  +  N  abstract"  with  all  the  codings  automatically 
attachable  to  words  formed  with  this  suffix.  (See  Third,  Fourth,  and  Fiftjl 
<5uarterly  Reports. )  Work  on  this  portion  of  the  project  has  been  temporarily 
shelved  to  deal  with  higher-priority  projects;  but  it  is  hoped  that  eventually 
this  can  be  taken  up  and  prefixes  (such  as  un-,  re-,  etc.)  also  dealt|[  with  in 
a  similar  manner,  k  much  reduced  and  altered  version  of  the  affix  routines 
has,  however,  been  developed  for  use  with  the  clause  routines  and  the  ambiguity 
routines.  The  need  for  these  latter  routines  was  also  seen  very  early;  since 
form-class  assignments  are  required  for  clause-bracketing,  and  since  many  such 
assignments  are  ambiguous  in  En^ish,  routines  were  developed,  and  have  been 
constantly  improved,  for  eliminating  such  ambiguities. 

To  prepare  the  way  for  improvement  of  our  suffix  routines,  the  need  was 
felt  for  a  more  complete  reverse-alpha^tlsed  dictionary  than  was  available 
in  Walker  *3  Rhyming  Dictionary  (the  only  published  lexicon  with  this  arrange¬ 
ment)  and  a  program  was  written  (and  is  available)  for  the  reversing  of 
Merriam-Webster's  New  Collegiate  (1959)  t  and  five  print-outs  of  this  alphabet¬ 
ization  were  made  (of  which  two  have  been  forwarded  to  RADC),  This  work  was 
done  before  the  similar  project  carried  on  at  the  University  of  Pennsylvania, 
and  it  is  believed  that  the  two  dictionaries  in  large  measure  complement 
each  other. 


ii 


Sow  hop*  ws  Mlt  at  th«  ovbaot  that  part  of  tho  lahorieua  liaad-oodinc 
of  diotloaasy  oatri^a  ali^t  bo  aroldod  If  a  tppo  of*Mohtiio  loamlag* 
rooodtofo  ooibld  bo  adioptod  In  piLaoo  of  altorlthw*  aad  ao  BoagLaa  Blaon 
eendaetod  an  oaporlwnt  (aoo  ippondia  XVni  nhioh  bhewd  «plBion» 

oonolaalToljii,  that,  ahilo  anahtno  poffoxninoo  in  a9ntaoti<lj[^^Qal39ia  ooald 

•  >.  » 

bo  qaioldp  inprorod  iqp  to  a  oortaiaf^SLnt,  thia  part  fall  far  bolov  tho 
roqairownta  of  inforaatlon  rotilaiTalf.  at  loaat  nith  paftflinhod  dietionazy 
oodinc  only,  /  ' 

Verii  an  aonantie  analyala  ban  takon  laad||li1inj  diffaront  diroetiona  at 
rarioaa  porloda,  aad  oalj  Tory  tontatiro  pralininaxy  roablta  ara  arailabilo 
aa  pot.  Tho  ftrot  approaah  wa  to  aanaldar  tho  dlotionary  oodlag  of  oartaln. 
aonantie  aolntlaBahipa  (othor  than  thoao  edgropod  by  affix  reatlnoa)^  anidi 
aa  apxMuyny,  aatanpigr,  hpponpiqr,  oto.  Tho  wot  roeont  appvoadh  haa  eanflnod 
its  attontlan  to  dovioiag  nathonatiaal  waavroa  of  tha  dagrao  of  apnooyny, 
aaing  aaoh  amilafeiLo  rofaranoo  aaafea  aa  «aaot*a  Thaaaaraa.  Ma>*wwff»a 

Miiitr*  afUfMiy  rf  ttmam 

fotvra  aecic  la  onrlaafod  aaing  atwh  larpar  dlotloBuulaa  aa  Morrlan  1libotor»a 
loir  IntornatlaMl  (Third  Bdition), 

Cloaaiy  Uahad  alth  tha  apmwijniv  waairo  la  tha  dofhlopwnt  of  tho 
apntaetio  agroownt  wowara  darload  fbon  tha  apaoial;  arrangawnt  aad  ranktng  of 
aontonoo  aarda  oallad  flB,  nhiah  la  MUy  doaeriboA  in  tl^  ropert*  Huh  of 
tho  aarii  on  fUX  aad  apuanpny  waaara  la  dna  to  Jl||ae&  T«  Oeek  aad 
Donald  Iftohia. 

Worii  OM  daw  for  a  tiaadaxyaly  by  J,  P*.  TMlIrno,  Qaano  WaoUay  aad  Jon 
Stoaart)  on  tho  pooalbilltioa  of  artlfleiil  langaafoa  hatriag  apaolfie  logloal 
proportiaa  (aoo  tha  Sixth  QaartozSy  ■apert)«  Thla  naa  to^poraril^  ihalTad 
ahon  oortc  of  IXB  aaonod  to  prova  It  a  wra  Ixtaraoting  tool  for  infomatlen 
atrlorel. 


Host  of  thMo  oho  hitvo  oet^j*lbotod  in  one  mj  or  another  to  thle  report  htro 
boon  naaed  abovot  sane  pertiene  |iare  boon'  orlttoa  bf  Fred  .  Beoeeheildort  Jr. 
Italy  parte  of  the  siport  are  Indebted  In  ▼arione  oaye  to  rrk  done  by  all  the 


oorkere  naned  in  the  Tariona  qtauri^sly  reporta.  The  ardnona  tadc  of  prepurlng 


the  appondleea  aeeoqpaaylng  thie  report  haa  been  largely  earrled  out  by  S«  T» 
Cook. 
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Abstract 


Under  the  title  Antewatle  Lanyaag^  An|3,n^a  (9rictnall7  Antesaitlop  ef 
general  Seaantftli)  reasareh  hits  dlrftftad  tomifda  tlie  dereloppant  of  a 
device  which  would  process  and  store  eo^detely  unedited  la|0,i8h  language 
texts  and  print  out  answers  to  quostimis  regarding  these  texts  presented  to 
it  in  their  natural  language  foni«  Tho  apprpa^i  foUeiad  requires  that  the 

computer  itself  sTntaetieaUy  analy^A  ipput  text  in  order  to  eotnrert  it  into 

/ 

a  special  fom  called  FIJI^  which  preserves  only  that  syntsetic  infomation 
wMsh  is  usefdl  for  data  zatrleTal  purposes^  In  their  nV  foms  sentences 
can  be  coapared  to  detendne  the  degrae  of  their  relationship  to  eadi  other 
in  respect  to  both  word^aning  and  propositional  nsaning.  A  hl^h  oorralation 
between  a  text  sentence  and  a  question  indicates  that  the  text  sentence  is 
a  relevant  answer. 

The  problems  of  oonstmction  a  device  of  the  kind  proposed  are  discussed 
in  sons  detail.  The  repe^  alao  daseribes  an  approaeh  to  the  asdianieal 
analysis  of  language.  It  aloe  oontains  an  aoeeuat  of  a  voralon  of  IX4B 
and  of  certain  prelimLnaxy  exportnanta  in  woi^vtlng  syntaetle  infomsticiu 
The  fourth  section  bidefly  lists  the  eondlueieno  and  rooempMndatiens.  both 
positive  and  negative,  which  nay  be  drawn  from  the  worh  ef  the  project. 
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General  Dlsensslon 

The  wdz4c  reported  upon  here  Is  intended  as  a  contribution  to  the  study 
of  the  problem  of  direct  eommunloatlon  between  men  and  ooiqputers  through  the 
medium  of  natural  language.  It  deals  with  the  design  of  a  derioe  which  would 
efficiently  store  a  large  body  of  data  fed  into  it  in  the  form  of  ooBg)letely 
unedited  natural  language  texts,  and  answer  any  question  posed  to  it  in 
natural  language  rmgardlng  these  texts.  The  construction  of  programs  to  carry 
cut  certain  procedures  which  would  form  a  necessary  part  of  the  operation  of 
such  a  derioe  are  described  in  some  detail.  It  could  be  agreed  that,  as  they 
stand,  these  procedures  -  indeed,  the  dioice  of  certain  of  these  procedures 
for  derelopment  rather  than  others  -  represent  fairly  drastic  coopromlses  then 
coDg>ared  with  what  might  be  regaxtHod  as  the  Ultimate  goal  in  this  field  of 
research.  This  is  true.  The  model  described  here  is  intended  to  satisfy  the 
minimim  requlremsnts  that  a  priori  can  be  demanded  of  such  a  derioe  if  it  is 
not  to  produce  insignificant  results.  Its  purpose  is  mainly  heuristic. 

Lacking  such  a  device  it  seems  difficult,  if  not  inqposslble,  to  see  how  better 
results  could  be  obtained.  Its  rdLizatlon  would  establish  a  context  in  which 
eBg>irical  experiments  could  be  conducted.  Bxaminatioii  of  the  results  |  of  these 
experiments  woiQ.d  suggest  the  best  ways  in  which  inprovements  could  be  made. 

The  approach  followed  differs  in  certain  iaportant  respects  from  other 
information  retrieval  experiments.  While  obviously  having  much  in  common  with 
medhanioal  document  retrieval  systems  it  differs  from  them  in  aiming  at  entirely 
dispensing  with  the  human  abstracter  lAio  prepares  a  specially  organised  text 
which  is  then  translated  into  a  machine  language  for  transmission  to  the  eon> 
puter;  documents  are  retrieved  throu^  the  recognition  of  certain  key  words  or 
of  a  cosblnatlon  of  key  words  occurring  the  the  stored  abstract  and  the  infor¬ 
mation  request* 
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Although  sharing  many  of  the  problems  eneountered  in  developing 
teehnlques  for  the  produotlon  of  neehanloal  abstraets  It  differs  from  this 
work  particularly  In  the  matter  of  scope.  The  Intention  here  Is  to  enable 
the  leadline  not  to  seleot  from  a  text  these  parts  mhleh.  In  aeeordaaee  with 
eerttln  built-in  criteria,  are  rated  as  most  signifloant,  but  to  handle  the 
iriiole  text.  This  would  remove  the  risk  of  the  total  loss  of  seme  Information 
due  to  the  machine  having  to  act  under  the  limitation  of  restrlotlve 
pre-judgments  eonoemlng  rtiat  Is  to  be  considered  Important  in  certain  kinds 
of  texts  -  somdhing  whleh  must  obviously  change  with  different  people  and 
different  times.  It  also  means  that  a  very  mueh  hl{^r  standard  of  pre  .Islon 
oan  eventually  be  expected.  The  oosputer  will  net  merely  stpidy  a  list  of 
references  to  documents  idileh  eontain  data  relevant  to  a  particular  request 
for  Information.  If  problems  of  storage  spaoe  oan  be  overoome  the  maehlne 
should  be  able  to  print  out  the  actual  data  Itself.  Failing  this  Is  shotQ.d  be 
able  to  provide  something  as  precise  as  page  and  line  references. 

Without  ceasing  to  be  useful  for  infomation  retrieval  purp^s  the  eoqputer 
oan  become  a  *questlon-an8werer'* 

An  automatic  question-answerer,  -  Is  already  In  operation  at 

M.  I.  T.^  The  conputer  stores  infomation  from  a  restricted  field  (the  date, 
l^oe  and  score  of  every  game  played  in  the  American  League  in  1959)  Ted  Into 
It  in  the  fom  of  a  special  maehlne  language.  Questions  regarding  this  data, 
phrased  in  natural  language  (bgllsh)  and  ranging  over  a  fairly  wide  degree 
of  lexical  and  s^ntaotleal  variety,  are  automatically  answered.  The  outstanding 
feature  of  the  device  Is  Its  capacity  to  carry  out  quite  eosplex  prooedures  idilch 
enable  It  to  answer  questions  of  a  considerable  degree  of  complexity. 

A  question  like,  "Is  it  true  that  every  team  played  at  least  once  in  eadi 
ball-part  during  the  month  of  August?*  for  example.  In  this  particvCLar 

1.  See*BaseballT  An  Automatio  Quest  Ion- Answerer*.  Bert  F.  Green,  Jr.,  Alice  K.  Wolf, 
Carol  Chonsky,  and  Kenneth  Laughery  Proo.  WJCC.  Los  Angeles,  Calif.  May  9-11,  1961 
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respect  Baseball  goes  beyond  the  kind  of  device  proposed  here.  It 
differs  from  it  also  in  t«^  other  respects.  First,  although  the  nadhlns 

automatically  processes  the  questions  put  to  it,  the  date  it  stores  is  fed  into 

it  in  an  already  processed  form.  Second,  the  data  it  operates  upon  is  of 

a  very  special  kind.  The  model  for  the  device  appears  to  be  a  quiz-congietltor. 

The  machine  answers  rightly  or  with  "don't  know".  The  device  we  propose  is 

modelled  upon  more  usual  human  behavior.  Ohder  ordinary  cireumstaness  the 

giving  of  a  direct  answer  to  a  question  is . fairly  unusual.  More  t^rpioal  is 

the  phenomenon  psychologists  call  "answer  by  WiWttriwIi'*  -  That,  is  to  say, 

when  questioned  one  tends  not  to  answer  the  questioner  directly  but  by 
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presenting  him  with  the  inf ormatlofi ^t  one's  disposal  idiloh  seems  most  germane 
to  his  query.  It  is  this  facility  that  we  wish  the  conpiter  to  simulate. 

The  advantages  of  building  this  capacity  into  an  Infozmiatlonaretrleval 
device  and  the  difficulties  of  acoonq^lshing  this  aim  are  now  discussed. 

Consider  the  ease  of  a  eonpiter  with  the  fbllowlng  sentences  stored  in  its 
memory:  (l)  The  boy  hits  the  ball  awi  (ll)  The  girl  drinks  the  coffee.  It  is 
desired  to  program  the  con^uter  in  such  a  way  that  in  response  to  the  question 
Does  the  boy  hit  the  ball?  Sentence  (i)  and  not  Sentence  (il)  will  be 
printed  cut.  It  is  assumed  that  no  system  of  labeling  sentences  is  to  be 
en^loyed.  The  machine  is  to  Interpret  the  question  Itself  as  an  instruction 
to  search  the  memory  for  the  sentence  which  )aatehes  it  most  closely.  The 
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question  provides  a  pattern  the  elements  of  which  are  the  individual  words. 

A  search  is  initiated  for  patterns  which  coincide  with  that  contained  in  the 
question.  This  is  the  simplest  Infonatlon  retrieval  situation  involving  a 
natural  language  text  and  a  natural  language  question.  Notice  that  already 
we  have  to  allow  for  a  certain  degree  of  tolerance  in  matching  patterns.  If 
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only  exact  matches  are  to  be  accepted  then  Seolgiitoce(l)  will  not  be  retrieved 
since  it  contains  one  less  word  (also  a  different  punctuation  naxic  if  these  are 
to  be  counted).  This  must  obviously  be  allowed  for.  We  would  wish  sentences 
like  (ill)  The  boy  hits  the  ball  hard  and  (Iv)  The  boy  hits  the  ball  every  time 
to  be  retrieved  in  response  to  this  qt^stion  if  they  formed  part  of  the 
corpus.  Notice  also  that  the  sentences  (v)  The  boy  never  hits  the  ball 
and  (vl)  the  bov  did  not  hit  the  ball  would  also  be  considered  significant 
answers.  In  matching  patterns  the  occurrence  of  negative  elements  can  be 
Ignored.  Negative  answers  are  as  likely  to  be  relevant  as  positive  answers. 

A  falluz«  to  retrieve  any  answer  at  all  is  to  be  interpreted  as  **don*t  know*,  not 
•no*. 

Now  consider  a  slightly  more  cooplieated  situation.  Ve  substitute  for 
Sentence  (i)  Sentence  (vli)  The  bov  strikes  the  ball.  The  sane  question 
is  posed.  If  wo  specify  that  It  is  the  sentence  whidh  offers  the  cLosest  match 
that  is  to  be  chosen  Sentence  (vii)  will  be  printed  out  and  not  Sentaisse  (ii). 

But  now  the  notion  of  tolerance  has  to  be  considerably  extended.  Tho  natch  is 
made  on  the  recognition  of  the  common  pattern  The  bov^^the  ball.  This  would 
mean  that  the  sentence  (vlll)  The  boy  kicks  the  ball  (ix)  The  boy  avoids  the 
ball  and  (x)  The  boy  reoofmlges  the  ball  if  added  to  the  corpus  would  be 
retrieved  at  exactly  the  sane  time  as  Sentence  (vii),  ^!o^e  disturbing  still, 
if  (xl)  The  boy  en.lovs  the  ball  is  added  to  the  corpus  it  too  will  be  retrieved 
as  a  relevant  answer. 

It  seems  then  that  The  boy»~«.the  ba^l  cannot  be  regarded  as  a  sufficiently 
well-defined  pattern  for  its  occurrence  in  the  question  sentence  and  the 
corpus  sentence  to  be  taken  as  a  criterion  for  the  retrieval.  This  decision 
prevents  the  noise  sentences  being  printed  out.  It  still  leaves  the  problem  of 
retrieving  Sentence  (ill)  which  on  all  counts  is  a  relevant  answer  to  the 
question. 


The  solution  adopted  denands  that  strike  be  considered  as  being  an 
element  of  the  sane  type  as  h^.  There  is,  of  oourse,  strong  intuitive 
motivation  for  this.  The  notion  is  reflected,  though  perhaps  not  very 
precisely,  by  an  entry  in  a  thesaurus  -  a  list  of  words  all  svibstitutable  for 
at  least  one  other  word  in  the  list  in  some  context  or  other.  We  have  postponed 
discussion  of  the  io^rtant  question  of  the  construction  of  a  thesaurus 
for  purposes  of  information  retrieval  ftrom  matnrai  'language.  For  the  moment 
«e  assume  that  the  computer  is  also  supplied  with  a  dictionary  and  that 
against  each  entry  is  a  list  of  numbers  corresponding  to  those  heading 
thesaurus  entries  in  some  standard  work  like  Roget*s  Thesaurus.  A  look-up 
routine  now  reveals  that  hit  and  strike  have  an  identical  entry.  The  question 
sentence  and  Sentence  (vii)  are  now  treated  as  sharing  the  same  three-element 
pattern.  Sentence  (vii)  is  chosen  and  the  rest  rejected. 

Identical  listings  in  a  thesaurus  are  rare  and  in  an  ideal  thesaurus 
probably  non-existent.  On  the  other  hand  partial  overlappijngs  are  firequent. 

It  is  reasonable  to  sippose  that  the  words  hit  and  kidc.  for  exaiqple,  woiAd 
occur  together  in  at  least  one  thesaurus  cluster.  An  extension  of  the  present 
method  would  interpret  the  deteetio^^  this  overlap  as  indication  that  kidc 
also  fills  out  the  pattern  but  in  a  less  exact  manner.  So  that  The  boy  kidcs 
the  ball  is  printed  out  in  response  to  the  question  Does  the  boy  hit  the  ball? 
but  only  after  The  bov  st1^^k«^^  t.hw  h«n  has  already  been  retrieved.  These 
machine  responses  can  be  equated  to  the  replies  •Tes*  and  *110,  but  he  kidced  it". 
The  point  is  that  the  second  of  these  might  be  relevant  information.  Whether 
it  actually  is  or  not  can  only  bo  decided  by  the  operator,  not  the  machine. 

A  further  inprovemant  is  suggested  by  the  observation  that  In  ltoget*s 
Thesaurus  the  principle  of  arrangement  is  such  that  a  list  of  terms  coeprising 
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one  entry  le  likely  to  be  tnmedlately  foUowed  by  one  made  t:q9  of  a  list  of 
their  oppoaitee.  The  faet  that  a  negatlre  aneiier  ie  likely  to  be  ae  relevant 
as  a  positive  one  enggests  that  in  eone  eases  adjacent  dusters  should  be 
nerged.  This  would  bring  and  tdse  together  in  the  sane  duster,  and  einoe 
it  can  be  assumed  that  miss  and  avoid  also  ooeur  in  tlis  same  list,  this 
extension  of  the  notion  of  negative  responses  means  that  Sentenoe  (ix)  will  also 
be  printed  out* 

Let  us  assume  that  eaoh  time  the  madiine  recognises  a  matcdi  between  the 
lexical  items  in  a  question  and  those  in  a  stored  sentenoe  it  makes  a  staple 
ooeqmtation*  Let  us  say  thatit  soores  1  for  a  fdl  matoh  hitt  Hit  or  Hitt 
strike  and  i  for  a  partial  matdh  hitt  kidc.  hit,  avoid*  The  seardi  for  matches 
now  produoes  a  rons^  kind  of  os'dering  of  the  sentenoes  in  the  ooxpus  in  terms 
of  their  rdevanoe  to  the  question*  Ignoring  words  such  as  definite  artides 
whidi  occur  in  no  thesaurus  entries  Sentences  (i)  and  (vii)  soore  3, 

Sentenoes  (viii)  and  (ix)  2^,  Sentences  (x)  and  (xi)  2  and  Sentenoe  (ii)  0* 

If  the  program  were  designed  in  sudi  a  way  as  to  ensuxv  that  the  sentenoes  were 
printed  out  in  descending  order  there  momld  be  no  need  to  decide  beforehand  on 
a  cut>off  point,  i*e*  a  soore  lelow  which  no  sentence  is  printed  out*  Once  the 
operator  observes  one  sentenoe  in  the  print-out  that  he  regards  as  irrelevent 
he  can  stop  the  production  of  ott^ut  on  the  assuiqption  that  the  rest  of  the 
information  contained  in  the  corpus  is  irrelevant  to  his  partiodar  need* 

Inereasing  the  nundber  of  potential  matches  by  associating  with  each  word 
a  list  of  thesaurus  entries  does  not  remove  all  the  problems*  In  fact  it 
introduoes  new  problems*  Consider  the  Sentenoes  (xii)  The  well-behaved  girl 
sins  her  coffee  and  (xiii)  Coffee  is  good  to  drink  and  the  queAion  Does  the 
young  lady  dring  coffee?  Co^mting  matdiss  in  the  ,ay  previously  suggested 
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vn  find  that  both  text  eentenoes  nake  the  sane  soore  (xil)  (jotmg)  la4|ft 
glrl*i,  slpst  drinliel  coffee  t  coffee  1  Total  2,  (xlil)  oeffeet  eeffMe&  Makt 
drii)ln>l  Total  2*  Thus  although  the  system  enables  (xil)  uhleh  has  onljr  one 
exact  matdi  to  score  the  sane  as  the  irrelevant  (xlil)*  idiioh  has  two*  it 
fails  to  establish  any  priority  for  its  retrieval  It  also  means  that  the 
noise  sentence  (xiv)  A  taste  for  liquid  honey  is  oharaeter< 
female  bears,  uhere  taste,  liquid,  immature  and  female  all  soore  i  points* 
would  also  be  retrieved.  The  difficulty  cannot  be  overeome  by  making  a  rule 
that  matching  patterns  must  not  only  eontaln  the  same  words  but  the  same  words 
in  the  same  order.  One  would  obviously  want  to  retrieve  Coffee  is  drunk 
hy  All  nell«bred  which  would  be  i«Jeoted  by  sueh  a  rule. 

To  overcame  this  difficulty  we  advance  the  1|npothesla  that  the  lade  of 
discrlminatlen  in  the  infomatlon-retrleval  qetems  so  far  developed  arises 

t 

from  the  fact  that  they  utilise  ohly  eemontie  infematioilj^  and  fell  eea(iletely 
to  nake  use  of  syntactic  infovnatlcn.  niey  fall  to  reoogaiee  that  (t»  put  it 
the  ^itqplest  fashion  possible)  Sentenoe  (xUi)  is  *abo«t*  mitm  aad  fits  relatienship 
to  drinking*  It  is  reasonable  to  claim  that  this  is  rilLsstsd  in  the  statement 
that  (to  use  the  most  neutral  gramnatloal  terminology  available)  the  subiset 
of  (xii)  is  gill,  while  the  sutttect  of  (xv)  is  ooffes.  Sentonees  whidi  have 


words*  or  wards  ftran  the  same  thesauzms  duster*  in  eonMn  are  more  dosely 
related  from  our  point  of  view  than  those  which  do  not.  But  the  factor  dildi 
controls  how  dosely  these  sentences  are  related  is  the  extent  to  whidi  the 
common  elements  share  same  |laces  in  their  syntactical  pattern.  Matching 
procudures  based  on  syntactic  information  can  be  made  m^^lreaely  eoaiprshensive. 


A  matdi  between  the  subjeet  of  Sentence  (a)  and  subjoct  of  Sentence  (b)  is 
obvloudy  more  significant  than  a  match  between  the  sctolssi  of  Semtamae  (a)  and 


the  object  of  Sentence  (b),  but  this  second  natch  is  still  Inportant*  It  is 
more  important  than  say  a  natch  be':ween  the  subject  of  Sentence  (a)  and  a  modi¬ 
fier  of  the  object  of  Sentence  (b).  Tet  this  fact  too  night  not  be  without 
significance  and  provislan  for  it  can  be  made  in  the  systen. 

The  question  of  the  ex|lanatory  power  of  gruMirs  has  only  recently 
begun  to  interest  linguists.  Its  Inpoxtance  is  a  direct  edftcone  of  the  decl- 
Sion  taken  by  certain  linguists  to  make  the  goal  of  syntactical  studies  not 
the  elaboration  of  procedures  by  means  of  j^lch  linguistic  data  can  be 
classified,  but  the  constructicn  of  grammars  idiidt  are  their  theories  of  sentence 
stirueture.  Since  the  observable  data  such  a  theory  has  to  cover  is 
infinite  it  must  be  generative.  A  grammar  of  this  kind  is  structured  as 
a  calculus  idiieh  generates  all  and  only  the  well-formed  sentences  [f!  of  a 
language.  Given  a  eonglete  graimnar  of  a  language,  for  any  well-formed  sen¬ 
tence  of  that  language  there  should  exist  in  it  an  ordered  set  of  formulae 
tdilch  would  generate  this  sentence.  This  is  expressible  as  an  ordered  (bradc- 
eted)  string  of  syntactic  symbols  and  constitutes  a  syntactical  analysis  of 
the  sentence.  Since  there  is  no  limit  on  the  mmibaar  of  generative  grammars 
that  can  be  constructed  for  a  language  the  question  of  choosing  between  conpet- 
ing  grammars  becomes  crucial.  It  has  been  suggested  that  the  most  powerful 
grammar  is  that  which  produces  analyses  which  most  fully  explain  the  native 
speaker's  intuitions  about  the  relationships  (samenep':  and  differences)  between 
sentences.  Preliminary  investigations  into  the  acecunt  of  syntactic  information 

2.  This  trend  is  particularly  associated .idtih  the  work  of  Noam  Chomsky.  A 
clear  statement  of  his  views  is  given- in  'The  Logical  Structure  of  Lin¬ 
guistic  Theory',  Reprtots  of  Papers  for  toe  Hjjath  Inteiytion^  Congress 
of  LingJists  (Cambridge,  Mass.  1962).  See  also  M.  A.  K.  Halliday 
‘'Categories  of  the  Theory  of  Grammar*.  Vol.  17  (1961). 
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necossary  for  efficient  data  retrieval  from  natural  language  texts  suggest 
that,  since  only  relatively  ooarse  distioptions  between  itbe  syntactic  func- 
tions  of  matching  words  appear  to  be  significant,  a  comparatively  weak. grammar 
vdll  prove  adequate,  ■  ’ 

The  adequacy  of  weak  grammatical  models. for  information-retrieval 
pirposes  is  important  for  two  reasons, 

(i)  The  complete  grammar  whioh  linguists  ^regard  as  .the  ultimate  goal  is 
unlikely  to  come  into  existenoe  for  a  oonsidteri^le  r,time,  it.  the  moment  we  do 
not  possess  a  single  grammar  for  a  natural  language  which  comes  anywhere  near 
to  satisfying  demands  of  completeness  that  might  reasonably  be  made  of  it  when 
no  other  purpose  underlies  its  construction  than  the  '8tu4y  of  language  itself, 

(ii)  The  linguist  who  constructs,  a,  generative  grammar  is  interested  in 
data  only  in  so  fir  as  Itltssts  his  theory*  Qlven  a  sentence  idiich  a  native 
speaker  agrees  is  well>formed  he  is  ooncenied  only  that  his,  grammar  should  cor:- 
t.ain  an  ordered  set  of  formulae  that .will  generate  that  .sentence^  The  question 
.  ho:;  he  decides  which  set  of  foiwulae  in  the  grammar  satisfies  this  condition 
Is  of  no  interest  to  him.  It  is  not  part  of; the  theory,  A  grammar. does  not 
tell  one  how  to  recognize  the  analysis  of  sentences.,  The. process  of  checking 

•  n  ideal**  sentence  generated  by  the -grammar  and  the  actual  sentence  observed 
affords  no  difficulty  to  human  beings.  Human  beings  have  intuitions  about 
language  anyway  and  an  Illuminating  granaar  is  illuminating  precisely  because 
it  in  some  sense  incorporates  these  intuitions*^  We  want  to  enable  the  machine 
to  make  use  of  grammatical  Infomatlon  because  it  does-  not  have  intuitions 
about  language.  The  problem  of  how  a  machine  is  to  determine  the  syntac¬ 
tic  analysis  of  any  sentence,  is,  therefore,  crucial.  The  .application  of 


^fi^puters  to  the  task  of  data-ro^l*^  tnm  natiml  lanfoafo  denands  the 
cionstruotlon  of  a  heuristlo  devipt  iihteh  will  enable  it  to  dertre  ftpon  any 
senoenoe  an  analysis  vhioh  (a)  eon^noods  to  an  ordeted  set  of  forsolae  in 
a  grasBHir  (b)  neets  the  strong  intoii^e  deaands  of  fitness  shen  applied  to 
the  sentenoe  tram  shieh  it  has  been  der^g^,  A  sinilar  derrioe  is  essential 
to  naohins  translation*  Maehlne  tranalatiws  dsnsnds  referenoe  to  an  extrenely 
p^rful  grasoar.  A  eharaoUristio  of  the  i(^  aodels  nsefta  for  infomation 
retrieval  purposes  is  the  fSO-biilrely  *  snail  t^ijter  of  fomulae  they  oontain. 

’Tie  problew  of  oonstraetion  of  a  recognition  ^^oe  for  this  graosBar  are  thereby 
<!reatly  eased. 


Any  rsaUy  slgnif leant  reduotion  in  the  ooiqplexity  of  syntaotie  analyses, 
h  vw?v«r,  is  not  as  ea^  to  obtain  as  it  night  seen.  It  is  an  inescapable  fact 
tliat  if  a  roeofniition  deriee  is  to  netk  at  all  effaeiently  it  will  almost 
t^'^rtainly  pr^duoe  analyses  containing  more  infomation  than  can  be  used  for 
data  retrieval  purpoaea.  The  reason  for  this  is  si^fle.  Cvsn  if  a?.',  that  is 
v?anded  of  a  recognition  devios  is  that  it  should  Indicate  nothing  more  than 
the  subject,  verb  and  object  of  a  sentence,  it  ean  only  do  so  as  the  result 
rf  first  assinming  labels  to  all,  or  alnost  all,  the  elements  in  the  sentence. 

oroblcr  hore  is  not  Just  that  certain  words  ean  be  either  nouns  or  vezbs, 
rr  vezbs  or  adjeetives,  ete.,  but  that  any  noun  ean  be  either  a  subj  ect  or  an 
b'oet  or  the  Rorlifier  of  a  subject  or  object,  or  the  norJifier  of  a  modifier, 
et.  The  last  difficulty  arises  out  of  the  recursive  tendencies  of  natural 
‘hnrjage.  .*>  qUr?nc!es  of  dhskind,  the  bey  with  the  dcr^  with  long  oars  with  white 
rr-'ts.  ete.  are  possible.  Osnsrative  fonnilae  for  this  night  be  > 

'ininal  Groups  "ominal  Qro^p  +  Xomlnal  Oroup Homlnal Groig)  t  Krrdnal  Croap  - 
•K  ircnal  Group.  V'o  tay  be  eonoemed  only  with  the  identificsv.lon  cf  r.i ?*'i.t  ;nsn1: 
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that  enters  into  the  first  of  these  foznndaet  bnt  this  is  iiqwsslbLe  nntil 
the  strueture  of  the  whole  string  has  been  established. 

To  aroid  the  danger  of  being  weiglisd  down  bgr  redtmdant  inforaatioB  ws 
propose  that  the  initial  pmeessing  both  of  data  and  questions  eensist  of  a 
translation  into  an  artlfiolal  language  in  whioh  only  those  syntaotlo  rsla« 
tionshlps  whl^  pro7e  slgnlfleant  In  infomatlon  retrlsTal  oeovo*.  The  analyses 
produced  by  the  roeognltlon  derlee  are  not  stored  but  interpreted  as  a  set  of 
instruetlons  for  rewriting  the  sentenoe  into  the  equlralent  form  in  the  arti- 
fieial  language.  In  this  way  we  are  rslSSred  of  the  problem  of  eonstraeting 
a  ooB^aete  grammar  for  fiiglish.  It  is  only  nsoessaxy  to  ensure  that  onrery 
analysis  will  contain  suffioient  Infomation  to  enable  the  sentenoe  for  idiidi 
it  has  been  produced  to  be  rewritten  as  a  weU-fomed  sentence  in  the  artifi¬ 
cial  language.  To  ensure  the  ooiqpleteness  of  the  grammar  of  the  artUioial 
language  is  a  relatirely  sinq^e  matter.  The  artlfiolal  language  will  oontain 
only  a  small  number  of  sentenoe  types  to  whldi  the  large  ntndber  of  Inc^lsh 
sentenoe  types  will  have  to  correspond,  irtifleal  languages  for  information- 
retrieval  differ,  therefore,  in  a  vexy  ijq>ertant  respect  from  those  oreated 
by  logicians.  Instead  of  reducing  arisigulty  they  preeMte  it.  Certain  forms 
(actives  and  passives  for  exa^d.et  The  boy  h^*  The  ball  was  hit  br 

the  bey)  are  related  syntaotlallyt  a  fact  that  oan  be  inoorporated  into  a 
grammar,  but  only  at  the  cost  of  some  added  ecoqilexity.  In  the  "odlapsed* 
grammar  of  Baiplsh  we  propose,  it  is  merely  nsoessary  to  ensure  that  both 
fonns  are  rewritten  Identically  in  the  artifloal  language. 

Marching  procedures  are  carried  out  on  sentenoes  in  their  tranolated 
forms.  For  ease  of  referenoe  the  artifloal  langriags  has  been  given  the  name 
FLEX.  Since  its  vocabulary  ocnsists  of  thesaurus  dLusters  rather  than  words 
It  is  characterized  by  minimnn  syntaotlo  organisation  and  maximum  semantie 
organization.  It  differs  most  notloeably  from  natural  languages  in  not  dls- 
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playing  luiy  of  the  syntactic  variations  which  prevent  the  development  of  what 
Yngve  calls  "depth, In  its  present  form  nearly  the  whole  grammar  of  FLEX  is  ( 
contained  in  the  formulae. 

;  (l)  S-*Subject  +  Predicate 

(2)  Subject  -iNoun 

(3)  Noun-> (modifier)  +  Noun 

(4)  Predieate->Verb  +  Cbject 

(5)  VerD->(Modifier)  +  Verb 

(6)  Cbject->Noun 

The  formulae  (3)  and  (5)  are  rectirsive,  it  being  possible  to  add  an 
infinite  number  of  modifiers  to  subjects,  verbs  and  objects  though  in  practice 
a  limit  is  set  at  four.  The  rules  are  not  ordered;  (3)  is  applied  again  after  (6), 

This  particular  language  is  almost  certainly  too  siiqple  to  promote  effi¬ 
cient  data  retrieval.  Its  usefulness  at  the  moment  consists  in  producing  a  con¬ 
text  in  which  hypotheses  concerning  the  adequacy  of  various  types  of  thesauri 
and  the  weighting  of  S3nitaetie  information  can  be  readily  tested.  Some  experiments 
in  texting  various  functions  for  weighting  syntactic  matches  are  described  in  Part  III. 

F^m  another  point  of  view  FLEX  may  be  regarded  not  as  a  langxiage,  but  as 
a  device  for  (a)  splitting  each  (single)  sentence  of  a  natural  language  into  two 
parts,  "Subject"  and  "Predicate",  and  (b)  assigning  weights  to  each  word  in  each 
part  according  to  their  "Is^ortance".  The  assumption  is  made  that,  in  general, 
words  which  are  grammatically  superordinate,  or  "head"  words,  will  be  more  iiq)or- 
tant  for  information,  and  modifiers  will  be  less  so,  while  such  items  as  conjunc¬ 
tions,  particles,  prepositions,  articles,  and  the  like  are  of  no  iiqportance  whatever. 

II,  1.  Construction  of  a  Heuristic  Device  for  Syntactic  Pattern 
Recognition  in  English. 


The  need  for  a  device  which  will  assign  a  significant  syntactical  analy¬ 
sis  to  any  English  sentence  was  exj^ained  in  part  I.  A  syntactical  analysis 
is  defined  as  an  ordered  sequence  of  symbols,  all  of  which  occur  in  the 
same  grammar,  and  in  which  each  symbol  corresponds  to  one  word  in  the 


i 


3.  V,  H,  Ihgve  'A  Model  and  an  I^othesis  for  Lwjguage  Structure,  Pro¬ 
ceedings  of  the  Awsrican  Philosophical  Society.  Vol,  104  (i960),  pp.  I3O-8, 
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■  ir as  a  ealeolua  of  fenmilae  ;)hleh  gon«rfl,te8 
all  and  only  all  tho  mil  fozwd  Mntonoaa  of  a  langaago*  A  oorreet  ayn- 
taetloal  analysis  Is  deflnod  as  an  ordsred  string  of  syidsols  whieh  mtCLd 
bo  genorated  by  a  grammar*  A  slgnlfioant  analysis  of  a  sontonoo  is  ono  ahleh 
is  oorreet  and  ahidi  meets  strong  intnitire  demands  of  fitness  ahen  assigned 
to  that  sentneoe. 

An  idea  of  liilie  problems  eneomtered  in  developing  sueh  a  program  oan 
be  gained  by  eonslderlng  the  simple  three  word  sentenoe  These  points  stand. 

The  sentenoe  is  syntaetioally  tmambiguous  (Demenstratlve  Adjective  +  Nodn)  + 
Verb,  being  the  only  signifioant  analysis.  A  eonpnter  analysis,  hoNover,  is  the 
output  of  look-up  routines  operating  with  a  dlotionary  ahloh  lists  for  each 
word  in  it  all  the  syntaotio  roles  that  oan  be  undertaken  by  that  word* 

(An  aeeount  of  the  struotum  of  this  dlotionary  and  the  look-up  routlxw 
will  be  found  in  Seetlon  2  or  this  report  and  assoelated  appendioes.) 

Since  it  is  uneeononieal  to  enter  bo^'  bey  and  boys,  .lump  and  .lumped,  .lump  and 
.lumping  eto.,  in  the  diotlenary,  and  sines  also  these  suffixes  eontain  essential 
syntaotio  Infonaatlon,  it  is  neoessary  for  the  look-up  routine  to  work  in 
oonjunotlen  with  a  routine  whloh  recognises  boys  as  bey  t  s  and  interprets  the 
s  as  indloating  plurality  in  the  noun.  This  routine  is  also  deseribed  in 
Seetlon  2.  In  the  present  ease  the  look-tp  routine  would  find  two  entries 
fBr  eaeh  word.  These  —  Demonstrative  Adjeotive  and  Demonstrative  Pronoun, 

Points  -  Noun  and  Vezb,  Stand.  -  Noun  and  Vezb.  Applying  a  rule  which  states 
that  eaeh  word  oan  fulfil  only  one  syntaotleal  function  at  a  tine  in  an 
analysis  results  in  a  total  output  of  eight  different  unrahked  strings. 
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Th«  denand  that  tha  naehlM  shehld  aalaet  only  th«  algnlfieant 
analyses  lapeses  too  strong  a  eonilllipq.  It  Is  snfflelent  if  It  retains 
only  those  shl^  are  eorreet.  In  praotlee  It  ean  be  assuaed  that  any 
ovfpvt  that  ean  be  shewn  to  be  a  eorreet  analysis  will  also  be  a  slgnlfleant 
analysis  of  the  sentenee  from  which  It  Is  derived.  A  maxlaua  program, 
therefore,  wotld  guarantee  the  eorreetness  of  all  analyses  printed  out.  The 
dlffietltles  of  attaining  sueh  an  objeetlve  are  perhaps  Insturaotosiabls.  In 
Its  i^ee  we  offer  an  approach  which  has  as  Its  ebJeetlTo  the  reeegnltlen  of 
nearly  all  Ineorreet  analyses.  In  this  oenneetion  we  offer  the  tentative 
hypothesis  that  we  select  one  syntactical  analysis  of  a  sentence  over  all 
other  possible  analyses  net  so  much  because  It  Is  the  right  one  as  because 
It  Is  the  only  one  that  Is  not  wrong.  In  the  case  where  there  are  two  right 
analyses  the  sentenee  Is  syntactically  ambiguous.  In  the  sans  way  If  the 
machine  could  recognise  ineorreet  analyses  then  It  could  be  assumed  that 
any  not  rejected  on  this  count  would  be  correct. 

It  Is  Inposslble  to  st^iply  the  machine  with  a  list  of  Incorrect 
analyses.  There  Is  no  reason  to  siqppese  that  such  a  list  would  be  any 
shorter  thai  a  list  of  oorreot  analyses.  Probably  both  are  infinite.  The 
soltittlon  we  adopt  arises  out  of  three  observations  t» 

(1)  The  most  frequent  and  Intractable  eases  of  ambiguity, of  form  dLass 
assignment  seem  (at  least  In  Bnglli^He  ooeur  when  one  of  the  assignments 
Is  verb. 

(2)  A  study  of  analysed  sentences  Indicates  that  an  eperatlonal 
definition  of  a  clause  as  that  part  of  a  sentenoo  which  contains  one  and 
ohly  one  verb  (excluding  auxiliaries  and  medals  when  these  are  followed  by 
other  verbs)  Is  quite  workable. 
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O)  In  the  coTurse  of  constructing  a  generative  grannar  many  of  the 
greatest  probleus  are  enoounterad  In  the  area  covering  the  generation  of 
sentences  conprlslng  more  than  one  clause.  It  Is  here  that  problems  of 
selection  -  that  Is  problems  of  de  cluAng  hotr  far  the  Individual  strings  making 
each  Clause  should  be  developed  before  they  are  associated  In  the  course 
of  generating  the  complete  sentence  -  appear  to  be  particularly  troublesome. 

The  difficulty  of  finding  the  slimiest  theoretical  statement  Indicates  that 
there  are  a  large  number  of  specific  features  to  be  taken  into  consideration, 
his  in  turn  suggests  that  a  list  of  incorrectly  juxtaposed  dauses,  described 
tn  terms  of  these  features  would  provide  criteria  for  rejecting  nearly  all 
Incorrect  analyses.  This  would  also  oover  one-clause  sentences  since  the 
nunbor  of  clauses  which  can  combine  with  the  null  class  Is  strictly  llmltsd. 

The  procedure  we  propose  for  mechanloal  recognition  of  syntactical 
analysis  consj^;!  of  six  parts. 

1.  Formation  of  a  list  of  syntaotie  symbols  for  eadi  sentence  by 
dictionary  look-up. 

2.  Resolution  of  ambiguous  assignicsnts  where  one  of  the  asslgiuaents 

Is  vert,  ilooepting  the  operational  definition  of  Clauses  given  above,  this  wHl 
mean  that  at  the  conclusion  of  this  operation  the  analysis  will  contain 
sufficient  Information  for 

3.  ordering  Into  clauses  to  be  ingjosed  on  the  string. 

4.  Checking  against  a  list  of  Incorrect  Clause  oombinatlons.  Oetactlon 
of  an  error  here  must,  I9  definition,  mean  that  a  mistake  has  been  made 
earlier  In  the  resolution  of  an  aidblguous  assignment  invCLving  the  form 
class  vert.  In  this  case  we  enter 

5.  The  rules  eiqjloyed  for  resolving  form  class  artlgulties  In  2 
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are  arranged  In'deaeeAdtng  03rder  aeeordlng  ^  .  ^hidlr  effiolenQ7.  The 
nunber  of  the  rule  used  to  reaolTe  an  aidilguoua  aaalgnaent  la  oarrled 
forward  with  Its  output.  In  this  way  onoe  a  ndstake  has  been  deteeted 
at  the  clause  level  oandldates  for  re-asslgnnent  are  easily  reeognlsed.  In 
oases  where  there  Is  more  than  one  eandiiib''  the  one  with  the  highest  rule 
nunber  Is  changed  first.  The  process  Is  eontlnued  until  a  legltlnate  result 
Is  obtained. 

This  slapllfled  sehene  nay  be  lq>roved  through  re«entry  into  the 
resolution  oyble.  At  this  level  we  nay  discover  that  a  word  groups  taken  as 
an  Idlon  by  the  phraslfloatlon  does  net  in  a  particular  Instanes  operate 
as  a  unit.  Such  a  phase  would  have  to  br  dNiUntled  before  re-entzy. 

Also  prepositional  phrases  whl(di  act  as  adverbs  ean  be  so  noted;  and  all 
words  resolved  by  "perfect*  rules  oan  be  noted  as  unaablguous.  Sudh 
infematlon  would  obviously  strengthen  the  vezb  resblutlon  of  phase  2. 

6.  Ordering  into  phrases  within  the  dLause  and  the  resolution  of  the 
renaining  aid)lgaous  asslgnMnts  are  oarrled  out. 

The  nuriser  of  rules  need  in  2  Is  kept  down  to  a  oos|>aratlvely  snail 
ninjber  by  adopting  the  strategy  of  giving  all  unoer>|in  eases  the  assigonant 
*vezb.*  This  asans  that  in  the  instanoes  where  this  Is  wrong  an  extra  clause 
Is  interpolated  into  the  sentence,  asking  the  dianees  that  a  oerreet  analysis 
will  be  fomsd  very  slight  and  ensuring  a  later  oorreet  rs>4ssigna»nt. 

Prograns  for  Parts  1  and  2  have  been  written  and  tested.  Work  en 
part  3  is  neaxly  oo^Aete.  Wbzk  on  part  4  has  been  postponed  until  all 
the  earlier  ones  are  folly  operational.  The  reasons  fbr  this  are  partly  that 
the  results  obtained  from  the  earlier  routines  are  so  good  that  it  is 
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possible  that  oert|in  Inoorreot  Juxtapositions  of  clauses  can  be  Ignored 

as  Ijqposslble  outputs  of  the  earlier  routines.  Innreltlgatlons  so  far  made  Indicate 

that  the  hypothesis,  that.  If  a  description  (even  though  partial)  of  the 

dlause  level  stages  In  the  generation  #f  a  sentence  can  be  derived  Aram  Itf 

then  this  will  contain  sTifflclent  Information  to  cause  most  incorrect 

analyses  produced  by  the  look-up  routine  Immediately  to  be  rejected.  Is  a  valid 

one.  So  far  onlly  a  handful  of  eases  have  been  discovered  where  wrong 

analyses  innild  be  derived  idileh  would  not  produoe  dause  Juxtapositions 

that  one  would  expect  to  find  In  the  list  of  Incorrect  eoinblnations  proposed 

above.  An  exao^le  of  how  this  will  work  Is  provided  by  the  sentenoe: 

'  Gkie  of  the  most  satisfactory  laboratory  experiments  In  the  field 
of  mechanijlbs  Is  the  measurement  of  surface  tension  by  means  of  a  Du  Nouy 
tensiometer. * 

The  look-tqp  routine  would  product  the  following  string 

AdJective/Pronoun  +  Preposition  +  Definite  ArtldLe  +  Adjecilve/Adverb 
+  Adjective  +  Noun  +  Noun  Flural/Present  Tense  Verb  +  Preppsltlon 
+  Definite  Article  +  Noun/Present  TenSe  Verb  +  Preposition  ■*  J^dun 
+  Auxiliary  Vert)  +  Definite  Artid.e  +  Noun  +  Preposition  +  Noun/Present 
Tense  Vert)  +  Noun  +  Preposition  +  Noun/Present  Tense  Verb  +  Preposition 
Indefinite  Article  +  Noxm  *■  Noun. 

The  ambiguity  routines  would  resolve  all  the  aablguous  assignments 
correctly  except  In  the  case  of  the  word  experiments  which  $3  wrongly  assigned 
as  a  verb.  The  resultant  string  Is  now 

(Adjeotlve/Pronoun  +  Preposition  -f  Definite  Article  *  Adjectlve/Adverb 
+  Adjective  Noun  *  Veib  +  Preposition  +  Definite  Article  +■  Noun 


+  Preposition  +  Nonp)  (+  Auaclliary  Vert)  +  Definite  Arti<fl.e  +  Mom 
+  Preposition  +  Hcun  +  Soon  +  Preposition  +  Notm  +  Preposition 
■f  ladllflait*  Artiole  +  Nottn  + 

The  routines  for  dividing  the  sentenee  itself  into  olaoses  would  break 
the  string  into  two  elanses  in  the  manner  indicated.  Routine  4  would 
indicate  that  the  juxtaposition  of  a  clause  containing  a  present  tense 
verb  and  no  marker  of  subordination  (which,  who,  etc.)  ^  a  danse  headed 
by  an  auxiliary  verb  is  incorrect.  Routine  5  would  reveal  expiferiments.  field. 
surface  and  means  as  possible  candidates  for  re-asslgnnents.  Rtoeriments  is 
the  one  resolved  by  the  highest  numbered  (l.e.  weakest)  rule.  Its  assignment 
is  changed  to  noun.  Clause  analysis  on  the  new  string  now  produces  only 
one  dause.  Sir.Ce  tliis  dause  can  stand  by  itself  to  form  a  whole  sentence, 
no  listing  for  dause  containing  present  tense  verb  and  no  marker  of 
subordination  with  the  null  is  found  by  Routine  4  and  the  new  analysis 

Is  allowed  to  stand. 


2 

Construction  ef  a  Routine  to  Bysdve  Jtoblguous  Form  Class  Aasianments. 

For  reasons  already  explained  it  was  decided  in  the  first  instancs 
to  construct  routines  to  resolve  only  those  aadbiguitles  involving  the  possible 
assignment  verb.  This  maant  the  oonstruotlon  of  five  eeparate  sets 
of  rules  for  the  resolution  ef  the  ambiguities;  noun/pzmsent  tense  verb 
(point,  stage,  face  etc.),  adjeotive/present  tense  verb  (dean,  complete,  dose 
etc.)  noun/past  participle  (and  sometimss  present  tense  also)  (cut,  set, 
felt,  thought),  adjective/past  partiolide  (fixed,  interested,  given,  etc.  ) 
and  present  partlciple/adjeotive/noun  (wsanlng.  using,  running,  etc.) 
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(See  i^jpendlx  XI7,  the  Aaiblgtilty  Routines).  It  was  subsequently  decided  to 
add  a  sixth  set  of  rules  in  uhieh  the  aBft>iguities  inrolyed  in  words  of 
sufth  Idlosyneratio  distribution  as  like,  except,  night,  can,  will,  even,  still. 
well,  and  a  few  others  would  be  resolved. 

A  rule  is  an  instruction  to  seare)^  the  envlronnent  of  an  anbiguous 
item  for  the  presence  of  another  item  or  items  which  are  diagnostic  in 
this  context.  For  exaiqle,  in  each  set  of  rules  one  of  the  first  to  apily 
is  that  which  Initiates  a  search  for  a  definite  article  Immediately  in 
fkont  of  th|  Ambiguous  item.  If  this  is  found  the  ambiguous  symbol  in  the  output 
of  the  loek-iq>  zputlne  is  rewritten  as  noun.  If  the  partleular  diagnostic 
item  cited  in  the  rule  is  not  discovered  the  next  rule  is  applied  and  the 
seardi  for  another  diagnostic  item  started.  Since  the  input  for  these 
routines  consists  of  the  foni  class  assignments  for  each  word  in  the 
sentence  read  from  left  to  right,  the  machine  resolving  eadi  adblgulty  as 
it  comes  to  it,  most  rules  direct  searches  to  the  left  hand  side  of  the 
enrlronment;  the  right  hand  side  being  likely  to  contain  only  information 
idiloh  lASiiitself  ambiguous.  In  the  interests  of  a  simple  iaehine  solution 
very  few  rules  demand  a  seai^  extending  over  more  than  three  items  in 
either  dlreotlon.  Since  the  distribution  Af  adverbs  in  Bsc^lsh  is  oacCrsmely  wide 
they  rarely  serve  any  diagnostic  junction  and  most  rules  indlude  the 
instruction  not  to  count  adverbs  as  part  of  the  environment  of  an  ambiguous 
item. 

The  orderillg  of  the  rules  is  important  for  two  reasons.  First, 
certain  items  are  diagnostic  only  in  the  absence  of  some  other  diagnostic 
items.  The  rules  most,  therefore,  be  arranged  in  such  a  way  as  to  ensure  that  a  search 


for  the  first  set  has  already  been  undertaken  and  failed  before  the 
others  are  looked  for  the  the  envlrenosnt.  Sseond*  siaee  the  general  routine 
denands  that  a  decision  be  iside  #  eatijH  eaMy  aid  tHhae  It 
the  rtQ.es  will  sonstines  produoe  wrong  restQ.ts  it  is  iiqportant  that 
information  as  to  how  likely  it  is  that  the  criteria  used  as  a  basis  of  a 
partictlar  decision  will  produoe  mistakes  shotQ.d  be  readily  available  •  In 
particular  this  prorisien  enables  us  to  e^cit  the  strategy  of  resolving 
most  uncertain  eases  as  verbs.  This  in  turn  means  that  the  ntodser  of  rules 
in  each  routine  can  be  kept  down  to  a  bare  winimow. 

Many  items  are  used  in  more  than  one  set  of  rules.  The  definite 

article,  for  exaiqple,  is  diagnostic  when  it  ocotors  in  the  environment  of  an 

item  designated  by  the  look-tqp  routine  as  noun/prssent  tense  verb,  or 

an  item  designated  adjective /present  terle  verb,  or  as  adjeotive/past 

participle,  as  noun/past  participle,  or  as  present  partlalple/adjeetive/noun. 

Muy  items  regularly  fulfill  the  same  diagnostic  functions  as  others.  The 

discovery  of  an  indefinite  article  in  the  environment  of  an  ambiguous  item  nearly 

always  leads  to  exactly  the  sane  resxat  as  the  discovery  of  a  definite 

article.  A  great  econoi^  is  attained  in  the  set  of  the  machine  solutions 

by  assigning  to  each  word  an  indicator  code  idii^  indicates  all  the 

diagnostic  functions  it  oan  fulfill.  As  each  word  enters  the  routHH  it  is 

considered  from  two  points  of  view;  first  to  see  if  it  is  aidUicuems  and, 

if  so,  which  kind  of  arisiguity  it  displays,  seoond  to  see  if  it  can  play  any 

part  in  the  resolution  of  the  ambiguous  form  fiUUis  assignments  of  other 
words,  i.e.  whidi  indicator  cedes  belong  to  it.  A  description  of  the  programming 
of  these  routines  foUowa^^  The  fUU  set  of  the  rules  in  the  forms  most  convenient 
for  programing  and  hand  checking  will  be  found  in  ilppendix  TTf  to  this  section. 
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CoapatT  Ttehnlqg*  for  th«  Solution  of  Fora  Clats  Imbigaitlts 


Data  Pfparation 

Tha  raw  material  need  in  the  ooapater  solution  to  the  word  ambiguities 
problem  can  be  divided  Into  two  oategorlest  dictionary  data,  azxl  solentlflo 
text  data, 

Dlctlonarr  Data 

The  fomer  category  was  originally  prepared  as  a  tape  file  for  use 
with  the  IBM  650  computer  programs.  This  dictionary  tape  file  has  been 

k 

cenrerted  by  an  IBM  7090  routine  to  a  format  which  Is  suitable  for  the 
IBH  1401  and  709,  At  the  time  of  Its  conversion,  the  dictionary  file  was 
modified  both  In  content  and  in  structure  to  Its  present  format,  (See 
Appendix  II,  File  1.)  The  preparation  of  this  category  of  data  Is,  therefore, 
complete,  Additions,  deletions  or  changes  to  the  file  will  hereafter  be 
made  throui^  the  updating  program  which  will  be  described  In  a  later  section. 
Text  Data 

The  initial  solentlflo  text,  Flaaet  Barth^  was  also  prepared  as  a 
tape  file  for  use  with  the  IBM  650  Computer  programs  and  it  also  has  been 

k 

converted  by  the  abovei^ientloned  routine  .  See  Appendix  II,  File  2,  A  special 
IBM  709  program  Is  being  written  which  will  he  used  for  assigning  to  words 
of  this  text  only  the  text  identification  described  below. 

The  preparation  of  solentlflo  text  data  Is  a  continuous  part 
of  this  data  processing  system  developed  for  this  project,  A  description  of 
the  manner  in  which  this  data  Is  prepared  follows. 

Text  data  preparation  Is  oarrled  out  In  two  stagest  the  conversion 
of  the  printed  material  to  punched  cards;  and  the  conversion  of  the  punched 
card  to  tape  files. 


4. 

5. 


D.  B.  Flanigan,  IBM.  Tat 
Planet  Barth.  Karl  Stums 


M  ^  ^  JSS  7090. 
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Text-to»card 


The  text  is  punched  (Seo  i^pendlx  I,  Format  2)  In  much  the  same 
way  as  it  is  typed.  The  foUovvng  conventions  are  observed  in  the  key¬ 
punch  operation. 

1.  Words  and  punctuation  ara  separated  by  blanks. 

2.  Every  paragraph  and  every  page  begins  a  new  card. 

3.  When  a  new  word  begins  a  card,  the  first  column  is  blank. 

4.  When  a  new  paragraph  begins  a  card,  the  first  two  columns  are  blank. 

5.  When  a  new  page  begins  a  card,  the  first  three  columns  are  blank. 

6.  When  both  (3)  and  (5)  occur,  the  first  four  columns  are  loft  blank, 

7.  When  both  (4)  and  (5)  occur,  the  first  five  colvmins  are  loft  blank. 

8.  A  sentence  or  a  paragraph  may  be  continuous  over  many  cards. 

9.  The  last  8  columns  are  reserved  for  card  sequencing. 

Card-to-tape 

The  primary  purpose  of  the  card-to-tape  conversion  operation  is 
to  provide  an  efficient  form  of  input  to  the  data  processing  system. 

Its  function  is  to  provide  each  text  item  with  an  identification  number 
corresponding  to  its  place  in  the  text. 

The  conversion  is  to  be  done  on  the  IBM  1401,  The  logical  records 
produced  contain  the  text  time,  with  the  capital  marker  moved  to  a 
convenient  location,  the  identification  of  the  text  and  the  identifi¬ 
cation  for  each  item  in  the  text.  Punctuation  is  moved  over  one  char¬ 
acter  in  its  irecord. 
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Th«  fomat  of  tlM  10-oharaotor  toxfc  idontlfloatlon  xnariMr  1st 
PPP  P  88  0  U  n 

whors 

PPP  is  ths  psgs  ntasbsr  la  ths  toxfc. 

P  is  fcho  mssbor  of  porofraph  bogiming  on  fcho  ptgo. 

88  is  fcho  ssnfconoo  iraxbor  within  fcho  porogroph. 

0  is  fcho  olsaso  atnibor  within  fcho  sonfconoo. 

U  is  fcho  snolysoblo  tmifc  within  fcho  sonfconoo. 

II  is  fcho  Ifcsn  ntMbor  In  fcho  sonfconoo. 

Tho  oonronfcion  is  foUowod  that  o  porogroph  is  oonsldorod  os 
being  oil  on  fcho  page  on  whioh  it  begins.  If  o  porogroph  is  aoro  than 
o  page  long  fcho  beginning  of  fcho  next  porogroph  has  a  page  nohbor  Mhioh 
is  greater  bp  (at  least)  two  than  fcho  pags  nxa^r  of  fcho  proeodlng  porogroph. 

Unit  msibors  are  assigned  later  bgr  fcho  nnifc  ana3jsls  routine.  Olauso 
nonbors  will  bo  assigned  hgr  fcho  olauso  analysis  routines. 

ruo  8fcandords 

For  imrpesos  of  unifomlfcy  within  fcho  data  prooossing  SFsfcsn,  all 
files  oonfoni  to  fcho  following  sbandardsi 

1.  Tapes  are  written  at  low  donslfcyi 

2.  Logieal  rooord  length  within  any  file  is  fixsdi 

3.  Block  (physioal  fci^  record)  length  within  any  file  is  fiasdt 

4.  Data  files  are  all  BOD  filosi 

In  addition  to  its  data  bilooks,  oabh  data  file  contains  one 
header  rooord  and  one  trailer  record  as  doserlbod  in  fcho  1D08  nsnusl.^ 

6.  IBII  Boforonoo  Hanoal  028-6100.1,  709/7090  nWT/ODTWT  OOITBOL 

aiari.  pp.  22-23.  -*«*-*. 
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(6)  laeh  blank  tap*  nountad  for  uaa  aa  oxitpat  of  tha  709  ooivatar 
prograu  oontalna  tha  Blank  Tapa  Labal  daaoribad  In  tha  3DG8  Mannal*^ 

Tha  data  proeaaalng  agratan  for  raaolTlng  aord  aniblgaittaa  raoailNMl 
tao  tjpaa  of  aaintananea  progranat  aort  prognma  and  npdata  prograaia, 

_  a 

Sorting  la  earrlad  out  hgr  naana  of  tha  atandard  M  709  Sort*  Sort 
aaquaneaa  ara  daalgnatad  both  In  tha  flla  daaerlptlona  (Ippandix  IX)  and  In 
tha  block  dlagraaa  (Ippandlx  in). 

Updating  of  all  fUaa  of  tha  a^ataa  naj  ba  earrlad  out  tagr  Mans  of 
a  alngla  IBM  709  prograa  Whloh  la  ourrantly  balng  urittan*^ 

Updata  Prograa 

Tha  partloolar  function  Uhleh  tha  prograa  parfoxaa  at  aagr  ana  tlaa 
daponda  upon  tha  apaolfleatlona  atatad  bgr  tha  uaar*  Thaaa  apuuifluatlona 
inoluda  a  ganaral  daaerlptlon  of  tha  oharaotarlatlca  of  tha  flla  to  ba 
aodlflad  and  of  tha  nannar  of  ■odifloatlon  (S^pondix  I»  Poxaata  4)* 

Spaolfieatlona  ara  punohad  on  oarda  and  oonvartad  to  tapa  bgr  a  atandard 
IBM  1401  program  In  «ihloh  aaoh  card  baeomaa  ona  fourtaan^vord  rooord*  (Sao 
Appandlx  II*  Flla  3).  Thla  "ohanga*  tapa  and  tha  tapa  flla  to  ba  nodlflad  ara 
tha  two  Inputa  to  tha  flla  updata  program*  Tha  prlmaxgr  output  of  tha  program 
la  tha  aodlflad  flla*  k  aaoondasy  output  flla  can  ba  pocoduead  if  roqoaatad 
In  tha  apaolfloatlona* 


7. 

8. 


IBM  Baf aranoa  Manual  C28.6100.1*  709/7090  UPOT/OOTHII  ^ 
3I8TM.  p.  21. 


IBM  Bafaranoa  Manual  028-6036,  1959. 
for  tha  IBM  222  fittA  Proeaaalng 


9.  Inatruetion  fUaa  ara  balng  updatad  bgr  standard  Oomptrand 
paokaga  updata. 
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Data  fllas  ara  aplatad  bgr  Mtna  of  eardo  #3  and  |t.  Dtpoodlng  on  tho 
ttoo  of  tho  paroaotora  la  thaaa  aavda  tho  pgragraa  Mj  ba  diraatad  tot 

1)  Looata  a  logioal  raoord  hf  ita  In  tho  fUa,  andj^l 

logioal  rooorda  l—odlataiy  aftar  tiia  ona  looatad. 

Tha  aetnal  raoorda  addad  follow  1— adlatoly  hOhliid  tha  apaolfloatlon 
oard  ft,  laah  of  tha  K  raoavda  waat  bagia  In  oolaan  1  of  a  eaidi  tha 
raoord  aagr  bo  oontlnaad  froa  oard  to  oard  (78  oolnania  par  oard)  antil  tha 
fall  alaa  of  tha  raoard  haa  boon  panohadU  Oontlnaatlon  earda  ara  Idantlf lad 
bj  a  *0*  in  oolaan  79*  Final  i  padding  of  raoorda  naod  not  bo  panohad* 

Sinoa  tha  raoord  and  blook  aiaa  ara  both  atatad  in  oard  #3,  a  fall  raoord 
will  alwagra  ba  addad. 

2)  Looata  a  raoord  bgr  ita  noaition  in  tha  fila,  and  dflata— baainnina 
with  that  raoord  or  tha  following  ona—l  raoorda.  lo  raoorda  naod  follow 
thia  apaoifloation  oard  ft. 

It  ia  poaaiblo  in  updating  data  fUaa  to  ddlata  oartain  raoorda  and  to 
add  othora  at  tha  aano  point  in  tha  filo.  Thia  naat  ba  dona  tgr  naana  of  two 
apooifioation  oarda  #h«  oaoh  raforring  to  tha  raooiM  aftar  ahioh  raoorda 
ara  to  ba  addad.  Tha  apooifioation  oard  with  tha  "Idd"  poranatar  and  tha 
raoorda  to  ba  addad  naat  praoada  tha  apooifioation  oard  with  tho  "Doloto* 
paramtor. 

3)  Looata  a  raoord  hj  ita  poaition.  tad  ohanga  that  raoord  of  tho  filo 
with  tho  naMi  fiaid  atatod  on  tho  apooifioation  oard  |h.  Hmu  a  raoord 

ia  found  in  whidi  tha  two  fialda  agraa.  daiata  or  add  I  raoorda,  or  ohanga 
that  raoord  bj  tho  raplaoonant  f iold  of  tha  apooifioation  14. 
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4)  Looata^^  a  spaolflad  field  in  every  record  of  the  file,  tlhenerer 
the  field  specified  matches  the  field  stated  on  the  specification  card  #4, 
change  the  reccrd  hy  the  replacement  field  of  the  specification  card  #4.^ 

The  dictionary  file  (See  Appendix  II,  File  1)  may  be  updated  by  record 
in  any  of  the  aboTe.,mentloned  «rays.  Bowever,  in  addition,  the  logical 
records  within  the  dictionary  file  which  contain  the  count  of  the  mmber  of 
other  records  within  an  alphabetic  grouping  are  modified  in  accordance  with 
the  addition  or  deletion  of  records  during  updating  • 

Data  files  may  also  be  examined  by  neana  of  the  specification  cards  #3 
and  f4.  When  used  for  this  purpose,  the  specification  cards  dlreet  the 
program  to  locate  a  specified  field  in  every  record  of  the  file.  Whenever 
the  record  field  matches  the  field  specified  in  card  #4,  the  record  is  jlaoed 
on  the  secondary  output  file. 

All  specification  cards  #4  except  those  using  the  "Add*  parameter  nay 
request  secondary  output.  When  requested  in  connection  with  deleticna.  the 
seoondazy  output  file  will  contain  all  records  which  have  been  deleted  from 
the  modified  file. 

When  requested  in  connection  with  changes,  the  secondary  output  file  will 
contain  all  records  which  have  been  modified  on  the  primary  output  file. 


10.  Helther  the  matching  field  nor  the  replacing  field  may  exceed  the 
slse  of  the  record.  The  match  field  must  always  be  completely 
stated  on  the  specification  card  but  nay  continue  from  card  to  card 
until  it  is  completely  stated.  No  characters  •  including  aoros  - 
may  be  <»ltted  from  the  field  statement. 

11.  By  this  direction  all  specification  cards  #4  are  applied  to  each 
record  of  the  file.  This  makes  possible  a  change  irtiioh  is  universal 
to  all  records  of  the  file,  or,  if  no  change  is  requested,  the 
selection  of  all  records  which  have  some  oomaon  characteristic. 
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D*tt  Hpdlfloatlon 

Th«rc  v  two  siwpiLe  tgrpos  of  toxt  Whloh  proporo  it  for  forthor  langoofo 
■naljrsls.  Tho  first  allows  tho  raworal  of  obwloua  paronthfStleal  ojqiresslons 
from  tho  mlddlo  of  a  sontonoo  for  soparato  analysis.  Tho  sooond  allows  us 
to  eonslder  sljaple  groups  of  words  whieh  nonally  funotlon  as  ono  to  bo 

troatod  as  a  slnglo  Iton. 

Soparatlon  Into  inalyaablo  Onlts 

This  routlno  (Soo  Appondlx  IV)  is  to  sot  off  pairs  of  paronthosos 
and  dashos  and  tho  itows  Inolosod  by  thorn  so  that  tho  Itaw  iwaodiatoly 
following  suoh  a  unit  way  bo  oonaidorod  as  inoodiatoly  following  tho  itow 
whieh  prooodos  this  unit.  This  is  done  by  assliping  unit  (analyahblo  unit) 
nuabers  to  oaoh  iton. 

3ueh  units  ean  than  bo  plaeod  at  tho  ond  of  tho  sontonoo  tgr  a  standard 
I.B,  sort  routlno.  Sorting  on  digits  1^6  and  8-10  soqnoneos  tho  filo  Into 
"analyssblo  unit*  order.  Sorting  on  digits  1-6  and  9-10  would  soquonoo 
tho  fUo  into  original  toxt  order.  For  oxawplot 

"It  is  nooossazy  (for  purposes  of  this  analysis)  to  oonaidor  those  itows 
together.”  will  bo  analysodi 


It 

Is 

looossary 

( 

IDOOUOIOOOI 

00U010202 

OOUDIOOOJ 

OOU01010<I> 

for 

purposes 

of 

this 

0011010p5 

0011010106 

0011010107 

00U010108 

analysis 

) 

to 

oonaidor 

00U010109 

00U010109 

oouoiqs^LO 

00U0I0210 

those 

itows 

together 

• 

0011010|011 

0011010012 

0011010!S^3 

0011010214 
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and  this  oan  be  sorted,  as  above,  into: 

"It  Is  neoessary  to  consider  these  Items  together,  (for  porposes  of 
this  analysis)* 

Phrasifioation 

To  allow  saoh  phrases  as  "of  oourse*  to  be  treated  as  preposition 
noun  is  to  oomplioate  the  graamatioal  analysis  of  the  sentence ,  and  raduee 
the  range  of  applieation  of  some  of  the  rules.  Suoh  jtoases  have  uexd 
functions  in  larger  language  units  and  fall  into  word-form  classes.  The 
Phrasifying  Routine  (Part  I)  (see  Appendix  V,  Phrasifioation  Flow)  makes 
suoh  treatment  possible  by  aooomid.ishing  the  ireoognition  of  the  several 
word-records  of  the  words  in  the  liirase  with  a  single  word-record  for  the 
entire  phrase.  Beoognition  is  done  by  comparison  of  a  phrase  entry 
prepared  from  input  word-records  with  a  list  of  phrase  entries.  If  the 
constructed  phrase  entry  is  identical  with  one  on  the  list,  a  phrase  has 
been  found.  The  list  is  prepared  in  advance  by  linguists.  The  routine, 
then,  serves  an  entirely  meohanioal  function,  albeit  somewhat  oomplioated. 

The  meohanioal  aspect  is  conplloated  by  two  conditions  of  the  problem: 
(1)  the  possible  overlapping  in  the  texts  of  phrases  on  the  list  and  (2)  the 
necessity  of  considering  all  combinations  of  words  in  the  text  not  ruled 
out  by  certain  practical  and  linguistic  limitations  from  being  possUde 
phrases.  The  first  condition  necessitates  a  decision  about  what  sequence 
of  words  is  to  be  taken  as  the  phrase  in  the  various  posslblo  oases. 


The  general  role  followed  Is  (a)  where  the  phrases  begin  with  the  sane 
word,  the  longest  and  (b)  where  phrases  orerlap,  the  one  idildi  ends 
last  Is  takep  as  the  phrase,  k  praetioal  llMltatien  laid  down  is  that 
phrases  are  no  longer  than  twenty^fonr  oharaeters  (Ineludlng  breaks 
between  words).  JL  llngolstio  llnltatlon  Is  that  we  do  not  look  for  phrases 
whibh  Inolode  ponotoation. 

Infenaatlep  Qatherlpg 

With  data  files  prepared,  updated  and  Maintained  in  yarlous  sort 
seqaenoes,  the  next  stage  In  the  data  prooessing  sgrsten  is  one  of  Infor¬ 
mation  gathering,  k  single  program,  the  Affix-Dletlonary  program,  has  been 
written  to  gather  information  from  the  dletlonarj  file  and  append  It  to 
the  text  file. 

For  the  purposes  of  grammatloal  analysis,  it  is  neoessarjr  to  hays 
prellminaiy  assignments  of  words  to  fora  olasses.  Since  the  same  word, 
howerer  many  times  It  ooours,  will  hare  the  same  preliminary  word  olass 
assignments.  It  Is  most  efficient  to  make  this  assignment  by  a  medhanleal 
dictionary  look-up  after  the  text  word  has  been  ceded. 

In  addition,  there  are  roles  in  Bh^lsh  oorrelating  regolarltles  In  word 
stmotures  with  oorreot  form  olass  assignments.  Sereral  words  are  often 
suoh  that  they  can  be  thouj^t  of  as  "oompdex”  words  foraed  from  one 
stem  by  the  addition  of  prefixes  and/or  suffixes.  These  "oomid.ex”  words 
often  oan  be  glren  a  oorreot  preliminary  fora  class  assignment  on  the  basis 
of  their  affixes  alone.  This  allows  the  nnaher  of  words  in  a  dictionary 
designed  to  proride  word  fona-olass  information  to  be  roduoed  radically.  It 
is  necessary  to  leare  in  only  those  words  for  dUch  a  preliminary  word 
foxu-class  assignment  cannot  be  made  on  the  basis  of  affixes  alone.  Mlth 
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such  a  dictionary,  If  a  dictionary  lookup  Is  pcrformad  and  then  affixes 
are  noted  and  the  rules  assigning  fona-classes  on  the  basis  of  then  are 
applied,  all  uords  for  which  adequate  provision  has  been  Bade  will  reoeive 
their  correct  preliolnary  word  fomuolass  assignBents. 

It  will,  Boreover,  be  useful  to  include  the  stems  of  the  lawfhl 
"complex”  words  In  the  dictionary  axid  to  perfom  a  lookup  for  thorn  also 
ptwided  there  aim  Items  of  Information  coanon  to  the  words  formed  from  a 
stem  which  are  of  use  In  grammatical  analysis*  The  value  of  the  rules  of 
fornv-olass  assignment  on  the  basis  of  affixes  will  be  prlaarily  the  reduo. 
tion  In  the  length  of  the  dlotlonaxy*  In  the  Beohanioal  table  looluup,  the 
length  of  the  table  Is  almost  never  a  negligible  consideration*  Uhere  a 
table  the  sice  of  a  dictionary  is  In  question,  any  ^mtaauitie  reduotion  in 
slae  Is  bound  to  be  important. 

As  It  turns  out  there  are  Items  of  Information  common  to  words  whose 

form-class  assignment  depends  upon  affixes  being  added  to  the  same  stma* 

The  reduotion  in  dictionary  length  when  only  stens  of  these  words  are 
listed  Is  still  significant.  So,  a  dictionary  Including  inoorrigibles 

and  stems  Is  used  In  the  mechanical  asslgnaent  of  preliBlnary  foim-dasses* 
The  other  dictionary  lnf(niaatlon  available  In  this  way  inoludes  form 
subclass  (see  Appendix  71,  Sub-Class  codes)  information  idiioh  is  not 
prediotable  from  affixes.  As  would  be  empected  from  the  above,  which  role 
Is  applied  to  produce  the  preUalnary  fonk>ola88  specification  of  a  parti¬ 
cular  "oomplex"  word  depends  upon  the  structure  of  the  word*  But  it  also 
turns  out  that  If  the  additional  dictionary  Information  About  the  stem  is 
utilised  and  the  rules  are  made  functions  of  items  in  that  Inforamtion, 
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the  roles  oan  be  nade  nore  effeottre.  This  type  of  role  reduces  the 
nnaber  of  tnoorrlglbles  In  the  dlotlonary.  toother  precutle  deelsion 
leads  as  to  eaploy  rales  aboat  sofflxss  only,  fia  aaidoy  tlMrefore,  in  oar 
Mohanieel  analysis  of  tests  a  Sofflx-Dlotionary  routine  for  the  pcedaetion 
of  dletlonary  Infomatlon  about  nerds  in  a  text. 

AfflxJM.otlenary  Proaran 

topat  filea  to  the  Afflx-Dlotlonary  Prograai  (see  Jippendlx  YIH, 
Affii&*Dlotionary  Plow)  are  twot  the  dlotlonaxy  file  (topw^lx  II»  Pile  fl) 
and  the  text  file  (ippmdix  n.  Pile  |2).  The  program  uses  a  binary  look-up 
to  avoid  sorting  and  resorting.  Since  ponetoation  narks  hare  been  placed  so 
that  they  have  a  aero  in  initial  position,  these  are  first.  However,  in 
these  oases,  diotionaiy  infonation  is  sere  and  the  main  psegram  is 
bypassed. 

Output  from  the  program  is  an  appended  text  file  (Jkppanilx  II,  Pile  ft) 
and  an  Error  Pile  (ippendix  n.  Pile  #5).  The  Error  Pile  is  produced  when 
there  is  no  matohing  diotlonary  entzy  for  a  word  of  the  text  or  for  its 
stem  after  affix  removal.  When  this  situation  arises,  additional  entries 
must  be  nade  to  the  diotlonary  file  by  means  of  the  Update  Program.  The 
dffixJ>iotionary  Program  must  than  be  repeated  with  the  updated  diotlonary 
file  as  input. 

By  means  of  this  program  the  lO-word  reoord  (Appendix  H,  Pile  #2) 
representing  a  word  of  text  inereases  to  a  20-word  reoord.  The 
Dietlonaxy  Program  inserts  into  the  expanded  reoord  (Appendix  H,  Pile  #t) 
oodes  representing  all  the  fom  classes  to  which  the  word  of  text  may 
belong,  inoluding  a  speolal  oode  indioatlng  a  preferred  form  olass,  if  any. 
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Subclasslfloatlons  of  fom  olaaaaa  and  another  QMolal  code  to  Indloate 
that  an  Itan  la  an  "abaoXuta  breaker*  (l«e,  the  Itan  la  the  flrat  eord  of 
a  olanae)  are  inaerted.  Coded  repreamtatlona  of  affixes  either  rMorad 
(before  locating  in  the  dlotlonazj)  or  not  raeoved  (beoanse  the  itee  was 
found)  but  Indieated  nerertheleaSt  as  well  as  the  dictionary  oode  Whleh 
direots  the  program  to  test  the  ending  of  the  dletlonary  word  Itself  are 
Included  In  the  e:q>anded  record.  Codes  for  Irregular  plurals.  Irregular 
past  tenses,  Irregular  superlatives  and  comparatives,  past  participles, 
and  past  forms  whleh  are  always  main  verts  are  also  Ineluded  (see 
Appendix  VII,  Dlotlonasy  Codes). 

Althotti^  more  Information  must  be  known  About  each  text  word  before 
amblgaltles  oan  be  resolved,  the  roBainlng  Information  ean  be  gathered 
in  the  Resolution  of  Mblgultles  program  Whleh  will  be  deserlbed  In  the 
next  seetlon. 

Onee  the  appended  file  has  been  produeed  It  Is  sequaneed  into  the 
aiudysable  unit  order  (Text  Identifloation)  in  order  that  the  ambigui¬ 
ties  nay  be  resolved  sequentially  from  left  to  right. 

Resolution  ^f  Word  Ambiguities 

The  resolution  of  word  ambiguities  within  an  analysable  unit  Is 
oarrled  out  on  the  IBR  709  Computer  in  a  manner  whleh  imitates  the  method 
amployad  by  the  linguist.  The  procedure  in  Its  simplest  fom  might  be 
stated  as  follows  i 

1)  mthln  an  analysable  unit  (left  to  rlf^t),  note  all  words  idileh 
are  meabers  of  only  one  fom  class,  l.e.  never  asdslgnousi 

2)  Vlthin  the  same  xmlt  (left  to  rl^t),  resolve  the  saiblgnltles  of 
certain  words  of  unusual  distribution,  as  or  like.  (This  step  involves 
the  applloatlon  of  one  of  a  group  of  rules  designed  speolfloally  for  the 
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resolution  of  aablgaltlos  Of  wnls  in  this  oImb,  Bxastnatlon  of  tho  laMdluto 
onrlronuont  of  tho  words  Is  roqolrod,) 

3)  Within  tho  sano  unit  (loft  to  riiJit),  dotomino  tho  psrtloular  tjrpo 
of  airi>igait3r,  sneh  ms  noun/voxb  jnwsont  tonso  or  sdJoetiTo/Tsz^  post  tonso; 

(Tho  SBbigultj  is  rosolrod  Taj  mpidTing  m  suitsblo  mlo  from  m  group  dosignod 
for  tho  resolution  of  this  tjpo  of  aubigultj.  Tho  mpt^omtion  of  tho  rule 
requires  testing  tho  lanodiato  onriromont  of  tho  sablgnouB  word.) 

Tho  program  written  to  follow  this  proooduro  oonsiats  of  a  oontrol 
routine  (See  Appendix  IZ,  Oontrol  Program)  and  a  group  of  subroutines. 

Control  Boutins 

Tho  oontrol  routine  roads  the  analysablo  unit  from  tho  appended  text 
file  into  the  oomputor  namoty,  noting  (stop  1  aboro).  as  it  rosds,  tho 
words  idiioh  are  unambiguous.  For  eaeh  of  those  words  tho  oontrol  routine 
fOnoes  tho  appropriate  Bngllsh  word,  i.o.,  HODH,  into  a  spooifiod  part  of 
its  manory  record. 

When  the  unit  has  been  oompletely  read  into  the  mamory,  the  oontrol 
routine  begins  its  seoond  pass.  In  this  pass  it  determines  the  words  belonging 
to  the  elass  defined  as  "words  of  Idipsynoratio  distribotion”  and  transfers 
to  a  subroutine  which  applies  rules  sequentially  until  it  resolwes  the  ambi¬ 
guity  of  the  word.  The  subroutine  then  supplies  the  Bh^Ush  word  for  the  form 
olass  and  the  resolring  rule  number  to  the  speolfled  part  of  the  mamory  record 
for  this  word  of  text.  Oontrol  returns  again  to  the  oontrol  routine  which 
proceeds  In  this  fashion  until  it  again  reaohes  tho  end  of  the  unit. 

When  all  suoh  words  hare  been  resolred,  the  oontrol  routine  makes  its 
third  and  final  pass  through  the  unit.  In  this  pass  it  determines  the  parti- 
oular  type  of  ambiguity  of  all  other  ambiguous  rezbs  and  transfers  tham  to 
the  appropriate  subroutines  for  resolving  tham.  These  subroutines  ftmotion 
In  the  manner  described  in  the  preceding  paragraph. 
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Whan  all  words  of  the  unit  have  been  resolved,  the  control  program 
writes  the  mit  whose  records  (Appendix  II,  File  #6)  now  show  the  resolved  ' 
form  class  and  the  resolving  rule  number. 

The  procedure  is  repeated  until  all  the  units  of  the  text  have 
been  anal!''zed. 

Indicator  Subroutine 

Omitted  from  the  above  description  of  the  control  routine  is  reference 
to  a  special  subroutine,  the  Indicator  Subroutine.  (Appendix  X,  Indicator 
Flow  Chart).  This  subroutine  con^letes  the  Information  gathering  process 
begun  in  the  Affix-Dictionary  program  and  provides  the  control  program  and 
the  remaining  subroutines  with  sufficient  facts  for  determining  the  type  of 
ambiguity  and  for  applying  the  specific  rules  for  its  resolution. 

Resolution  of  form-class  ambiguities  depends  upon  the  analysis  of 
regularly  recunring  environments  (indicator  situations).  First  these  indi¬ 
cator  situations  were  broken  down  into  parts;  eharaoteristles  of  the 
ambiguous  item  itself  (for  example,  being  the  first  word  in  the  sentence, 
capitalization,  etc.),  characteristics  of  Immediately  preceding  words, 
characteristics  of  preceding  word  +  1,  characteristics  of  preceding  word 
ignoring  non-?repositional  adverbs,  and  do  on.  Indicator  categories  were 
then  set  up  and  codes  were  given  to  each  of  these.  Indicator  codes,  which  could 
be  used  in  the  machine.  For  example,  one  part  of  the  indicator  situation  for 
several  rules  among  the  six  rule  set  is  the  presence  of  a  modal,  copulative, 
or  auxiliary  verb.  For  one  situation  a  member  of  this  class  is  required  to 
be  immediately  in  front  of  the  ambiguous  item;  for  another,  non-preposltional 
adverbs  may  be  ignored.  Again,  for  others,  a  member  of  this  class  must  Immediately 
follow  the  ambiguous  item,  in  one  case  immediately  following  a  word  belonging 
to  another  category  which  itself  follows  the  ambiguous  item;  in  another  ^ 

adverb  and  adverb/adjectives  may  be  ignored.  Finally  in  one  rule  it 
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Is  only  required  to  be  the  next  verb.  Nevertheless  there  Is  only  one  indi¬ 
cator  code  for  this  class  -  a  "1"  in  the  10th  position  of  the  first  indicator 
word.  This  means  that  each  word  in  the  sentence  is  tested  to  see  if  it  is  a 

member  of  this  class,  and  if  it  is,  a  "1"  is  placed  in  the  arbitrarily 

determined  position  of  the  arbitrarily  chosen  indicator  word;  otherwise  there 
is  a  "0*  in  this  position.  Since  this  class  is  a  subclass  of  a  slightly 
larger  indicator  category  a  "I"  would  be  put  in  the  predetermined  place  indi¬ 
cating  that  the  word  is  a  member  of  this  larger  class  also.  Since  both  modals 

axid  auxiliaries  are  themselves  members  of  other  indicator  categories,  "indi¬ 
cators*  must  be  placed  in  several  places  for  such  an  item.  Before  any  attempt 
is  made  to  resolve  the  verbal  ambiguities  every  word  in  the  sentence  must  be 
tested  with  regard  to  these  indicator  classes  (see  Appetidix  XI,  Indicator  Codes). 

On  first  consideration  the  task,  performed  by  the  control  routine,  of 
determining  the  type  of  ambiguity  of  a  word  seems  relatively  straightforward. 

It  can  be  shown,  however,  to  be  quite  complex,  involving  many  computer  instruc¬ 
tions.  For  this  reason  the  control  program  transfers  to  the  Indicator  Sub¬ 
routine.  The  Indicator  Subroutine  performs  the  necessary  tests  and  classifies 
each  word  by  inserting  a  number  in  the  Ambiguous  Word  Code  (Appendix  II,  File 
#6).  The  control  routine  need  then  make  only  a  single  bit  test  to  determine 
which  type  of  ambiguity  exists  and,  thereby,  determine  idilch  subroutine  must 
be  entered. 

In  providing  sufficient  information  for  the  subroutines  to  operate 
efficiently,  the  Indicator  Subroutine  is  even  more  valuable.  An  illustration 
of  a  rule  from  one  of  the  subroutines  will  seirve  to  Illustrate  its  value. 

Is  the  preceding  item  ^7  If  yes,  does  the  ambiguous  item  have  s  as 
affix?  If  yes,  take  as  noun.  If  no,  see  if  the  item  before  tg  is  a  nominal  s 
the  ambiguous  item  or  acoord^.  .Sasatias*  antaaonistio.  attributable. 

basic,  complementary,  contradictory,  contrary,  equivalent,  foreign. 
tile,  inimical,  liability,  opposite,  proportion(al).  regard(s).  insistent. 
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re8pact(3).  sensitive .  similar  or  supplementary,  or  if  previous  verb  or 
adjective  is  part  of  ascribe,  attack,  attribute,  belong,  attain,  eiinpr. 
commit,  convert,  oppose,  pertain,  reconvert,  relaie.  or  sub.ieet.  tf  yes, 
take  as  NOUN.  If  no,  take  as  VEM. 

Obviously  the  rule  is  complex  even  in  the  number  of  questions  it  asks 
before  a  resolution  can  be  made.  Each  of  the  subroutines  has  between  20 
and  40  rules  of  varying  complexity.  The  additional  complication  of  deter¬ 
mining,  for  example,  if  the  word  in  question  is  a  member  of  one  of  the  groups 
mentioned  above  increases  the  rule's  complexity  from  a  programming  stand¬ 
point  and  decreases  its  flexibility. 

Flexibility  is  a  most  important  feature  of  this  entire  data  processing 
system.  It  is  ejqwcially  Important  in  the  program  for  resolving  ambiguities. 
There  have  been  changes  both  in  rules  and  rule  oixlering.  There  have  also 
been  insertions,  deletions,  and  changes  in  word  groupings  such  as  the  groups 
underlined  in  the  above  illustration.  It  is  anticipated  that  when  the  results 
of  the  computer  programs  are  studied,  more  changes  are  inevitable,  A.  decision 
was  made,  therefore,  that  in  order  to  maintain  maximum  flexibility  the  imles 
perfomed  by  the  subroutines  should  be  stated  simply  and  directly  in  terms  of 
ccmjjuter  instructions  and  that  the  detennination  of  "belonging"  to  classes 
should  be  separated  from  the  rules. 

In  the  example  above,  the  Indicator  Subroutine  "indicates"  that  a  word 
of  the  unit  does  or  does  not  belong  to  the  group  of  words  —  give,  etc,  by 
simply  storing  a  "1"  or  "0",  reepectively,  in  a  certain  position  of  computer 
word.  In  the  memory  record.  The  "1"  and  "0"  are  known  as  indicator  codes,  the 
computer  word  in  the  memory  record  is  known  as  indicator  code  word.  There  are 
about  105  indicator  codes.  The  groups  which  can  be  classified  by  means  of 
indicator  codes  are  varied.  Among  them  are  the  followings 

1)  All  words  which  are  either  object  pronouns,  or  indefinite  pronouns. 
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2)  all  words  which  arc  either  possessive  adJeotlveVf  or  anoblgaous 
precisely  as  adjectlve/pronoons,  or  aiAtlgoous  norms. 

3)  all  words  which  are  either  past  participial  adjectlres  or  words 
which  end  In  Ing. 

The  control  routine  calls  upon  the  Indicator  Subroutine  at  soreral 
different  times.  By  so  doing,  the  Information  recorded  in  the  Indicator 
code  words  of  the  memozy  records  Is  maintained  in  Its  most  precise  form  for 
use  by  the  subroutines.  Instead  of  testing  a  word  against  a  long  list,  for 
ezai^le,  to  determine  whether  or  not  a  rule  Is  suitable  for  resolvtng  a 
word*s  ambiguity,  the  subroutine  need  test  only  a  single  bit  position  of  an 
indioator  code  word. 

Other  Subroutines 

It  Is  by  means  of  the  subroutines  that  the  oomputer  resolves  the  aabb> 
gnous  words  of  the  text.  There  are  six  subroutines  whloh  oorrespoiid  to  the 
six  types  of  ambiguities  whloh  are  to  be  resolved.  Baoh  subroutine  luade 
up  of  a)  a  control  program,  b)  a  rule  table,  and  o)  coding  which  represents, 
in  cosqputer  terms,  each  of  the  rules  in  the  set  for  resolving  the  speoifie 
ambiguity. 

The  subroutine  control  program  Is  standard  for  all  the  subromtines. 

It  may  be  stated  as  follows: 

1)  Advance  a  rule  coimter  C. 

2)  Locate  from  a  rule  table  the  starting  address  of  the  oodijog 
corresponding  to  Bole  C. 

3)  Transfer  to  the  coding  for  Role  C. 

The  coding  for  Role  C  either  resolves  the  ambiguity  or  returns  to 
step  1)  above.  If  the  ai^l^lty  Is  resolved,  the  resolved  part  of  speech 


and  tha  resolving  role  number  C  Is  supplied  to  the  specified  pivrblon  of  the 
text  word  In  maaoxj.  The  counter  Is  reset  to  sero  and  the  subrtntine  returns 
to  the  main  Control  Soutine. 

The  rule  table  consists  of  a  list  of  sTubollo  addresses  corresponding 
to  the  entry  points  to  coding  for  each  rule  in  the  set.  The  subroutine 
executes  the  rules  h7  transferring  in  sequence  to  the  syabollo  addresses  in 
this  table. 

The  use  of  a  rule  table  adds  another  feature  of  flexibility  to  the 
system.  By  rearranging  the  sequenoe  of  the  aymbolio  addresses  in  the  table 

it  is  possible  to  rearrange  the  order  in  mhich  the  rules  of  the  set  are 

\ 

applied.  It  is  further  possible  to  eliminate  the  application  of  certain 
rules  by  simply  omitting  their  anabolic  addressea  from  the  table.  Rules  may 
also  be  added  by  suppldng  the  necessary  coding  for  the  rule,  assigning  it  a 
symbolic  address,  and  inserting  this  address  in  the  desired  place  in  the 
table.  These  changes  are  mady  reassaid>ling  the  program.  The  rule  number 
is  not  aotually  attached  to  any  piece  of  oodlng,  but  oorrespoads  to  the  order 
of  role  applloatlon. 

The  coding  for  a  rule  in  the  set  may  require  the  examination  of  the 
Indioator  codes,  the  foxm  classes,  and  other  oharaoterlstios  of  the  ambiguous 
word  or  of  the  words  which  precede  or  follow  it.  Uhen  examination  of  wozds 
other  than  the  asbigoous  word  is  required,  the  subroutine  gains  access  to 
these  words  by  means  of  special  *3earoh”  routines. 

The  search  routines  (see  APPENDIX  XU  -  The  Search  Routines)  were  developed 
to  meet  the  oomson  need  of  all  the  subroutines.  There  are  eight  seaaroh 
routines:  four  to  locate  words  preceding  the  given  word  and  four  to  locate 
w>rds  following. 
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Thu  Ibfl«U  Bf— kT  SMbramtiJi* 

In  addition  to  dbaolnto  braakar  Itona  idiloh  alaajra  bagln  a  olanaa, 
tbara  ara  •i^pla  iltvationa,  Inaolaing  oalj  two  or  tbraa  aorda,  iMeh  alaaja 
Indloata  that  at  laaat  ana  of  |ba  Itnas  bagint  a  olanaa,  Thaaa  altnationa 
nonully  indloata  tbat  any  aaaroh  mat  atop  tbara.  Tbagr  nagr  alao  aarva  to 
Unit  tbo  poaalbllltlaa  for  an  mtoi  fopia  lion  lAtlob  praoadoa  ona  of  than. 

Thla  pro  gran  haa  baan  wlttan  to  plaoa  tha  broaka  as  Indloatad  In  tha  appendix 
(saa  APRRDIZ  Xf,  Ibsolnta  Braater  Xdats). 


Oonrtrttction  of  a  Routlnt  to  Mark  Olmf  PlTislw  In  a  Partially 


Analrud  Stntmct. 

A  «gmtaetlo«X  analsrsla  mub  dafinad  at  an  axdarad  aaqpanoa  of  ajntaotioal 
8|jid)ola.  Tha  darlea  to  produoa  ayntaetioal  analytit  for  angr  ftigllah 
aantanoa,  idiloh  wa  ara  daaorlbingt  traata  tha  prooaaaaa  of  aaaipilnc  ayntaotlo 
ayaibola  to  vorda  and  of  ordartng  thaaa  aynbola  aa  balng  Intordapandant, 

9aff lolant  Infonutlon  la  darlvad  for  a  atring  of  ayiibola  prodaoad  bgr 
tha  look-up  routtna  to  allow  a  partial  ordarlng  Into  elanaaa  to  ba  tapoaad  upon 
it.  This  partially  ordarad  atring  in  ita  turn  eontalna  auff lolant  Infenutlon 
to  anabla  althar  a  propar  aaalgtMant  of  ajmbola  to  ba  nada  for  all  uorda 
and  ordartng  Into  phraaaa  within  tha  olauaaa  to  bo  oarrlad  out,  or  oorrootlona 
in  prawloua  wrong  aaalgnaianta  of  ayntaetlo  ayhbola  to  ba  aada  raaultlng  in  a 
ra-ordarlng  into  olauaaa,  in  whloh  Itana  Inproparly  labalad  oan  than  bo 
aaalgnad  an  ahblguoua  ayadsol  and  ordarlng  into  phraaaa  ooaplatad, 

Diwlalon  Into  olauaaa  la  oarrlad  out  in  many  oaaaa  on  Inpropar  atringa 
of  ayadwla,  ia«  atringa  oontalntng  aidilguouB  ayhbola,  Sinea  tha  aablgulty 
routlnaa  daaoribad  abora  hawa  already  boon  oallod  In  thara  ara  no  lapnopar 
atringa  oontalning  tha  alanant  wart.  A  aaaroh  la  Inltiatad  at  tha  and  of 
of  tha  atring  and  oontinuad  forward  to  the  beginning  until  althar  a  wart  or 
an  abaoluta  braakar  la  found,  Jki  abaoluta  breaker  la  a  wrd  Ilka  idiioh.  what. 
baoaaaa  or  a  phraaa  Ilka  Juat  aa  thouA.  In  order  to  whloh  regularly  atanda 
at  tha  beginning  of  a  olanaa.  Thaaa  worda  and  phraaaa  ara  oontalnod  In  tha 
dletionaxy  and  ara  aarkod  aa  abaoluta  braakara.  Tha  nartiera  of  another  aat 
of  abaoluta  braakara  ara  idantlflad  by  a  aaaroh  routine.  They  ara  althar 
^uxbapoBltiona  of  Itaaw  idiloh  are  li^aalbla  within  tha  aana  olanaa  auoh  aa 
Vert  +  Sub  jaot  Pronoun  or  itana  which  oan  oeour  together  within  tha  aana  phraaa 


•nd  lAwse  Juxtaposition  ijvlloates  that  the  first  slsmsnt  Is  joining  tso  olsnses, 
snoh  as  that  -f  dof Inlts  or  Indafinlts  artlolo.  Prosont  partlolplss,  Inflnl. 
tlTss  and  past  partlolplss  sndlng  in  >sn  llko  alTSp.  writtsp.  oto. ,  idMn  not 
proeodsd  bj  anxlHary  vort)s  aro  also  absoluto  broakors.  lists  of  sbsolnto 
broakors  eorrantly  rooopilssd  bjr  ths  roatino  aro  glTon  in  Appondlx  0.  Tho 
boginning  of  a  sontonoo  oounta  as  an  sbsolnto  broakor. 

In  tho  oaso  ohoro  tho  soaroh  finis  an  absoluto  broakor,  tho  boginning 
of  a  olauso  Is  narked  as  oeeurring  there  and  a  now  soaroh  Is  Initiated  at  this 
point,  Ih  tho  oaso  ohoro  tho  soaroh  finds  a  rerb  before  oonlng  to  an  absoluto 
broakor  this  foot  Is  registered.  If,  after  this,  an  absoluto  broakor  Is 
found  before  another  roib,  a  olauso  division  Is  narked  at  tho  absoluto  broakor 
and  a  now  soaroh  starts.  If  another  voib  Is  found  before  an  absoluto  broakor 
tho  situation  booonos  sonoohat  noro  oo^plloatod.  It  Is  oortaln  that  sonoohoro 
between  tho  too  voibs  a  olauso  division  has  boon  orossod  sinoo  no  olauso 
oan  oontaln  noro  than  one  verb  (unless  one  of  than  Is  an  auxiliary  or  a  nodal), 

Li  order  to  discover  whore  this  division  lies  It  Is  nooossary  in  nost 
oases  to  oxanlno  tho  oharaotorlstlos  of  tho  two  verbs  involved.  For  this 
It  Is  nooossary  to  nake  use  of  tho  sub.olass  infonoation  oontained  in  tho 
dletlonary.  Particularly  relevant  Is  the  sub-classifloation  of  verbs} 
transitive.  Intransitive  and  transitlve/intransltlvo.  The  tom  transitive 
hero  Is  given  a  spoolal  interpretation.  It  is  applied  not  to  verbs  Which 
nost  always  take  an  object  (io,  nast  bo  foUowod  by  a  noninal  group  not 
headed  by  a  proposltior)  but  to  those  whioh  always  do  so  whan  thoy  aro  tho 
nain  verb  of  a  sontonoo,  Tho  verb  for  oxaaplo  is  narked  in  tho  dlotionary 
as  transitive,  sinoo  It  will  only  ooonr  without  an  objoet  when  It  ooours 
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as  a  passive  participle  or  in  a  passive  phrase  (Cf.  ^e  filled  the  sack  with 
flour.  He  lifted  the  s^ak  with  flear.  The  sack  wf  with  flew). 

This  provides  the  basis  for  a  rule  for  deteminiJig  clause  boundaries. 

In  a  situation  in  idiioh  a  clause  boundary  has  to  be  established  betueen 
two  verbs  (uhat  we  shall  call  from  here  a  redistribution  situation)  if  the 
ri^t  hand  verb  is  transitive  and  has  an  -ed  suffix,  and  is  not  foUowed 
by  an  object  then  the  clause  boundary  is  to  be  drawn  ianediately  in  front 
of  it  as  in  He  lifted  the  sack/filled  with  flour.  It  doesn't  matter  which 
sub-class  the  left  hand  verb  belongs  to;  nor  what  tense  it  is.  In  these 
cases  it  will  always  fora  part  of  a  degenerate  clause,  functioning  as  a 
post-nominal  modifier  of  the  preceding  noun.  The  importance  of  the  sub-olassi- 
fieation  transitive  and  intransitive  provides  the  rationale  for  beginning 
the  clause  division  routine  at  the  end  of  the  sentence.  Once  the  end  of  a 
clause  has  been  established  (and  in  nearly  all  caaee  the  end  of  the 
last  danse  is  the  end  of  the  sentenoa^  then  to  discover  whether  the  verb 
in  that  clause  has  an  object  is  a  eouparatively  simile  procedure.  It  is  only 
necessary  to  establish  that  somewhere  before  the  end  of  the  clause  there 
oooure  a  noun  or  some  element  of  a  nominal  group  suoh  as  a  definite  or 
indefinite  airtlcle.  or  adjective,  or  a  demonstrative  pronoim.  or  an  object 
pronoun,  etc.  which  does  not  have  a  preposition  iinnedlately  in  front  of  it. 

Another  important  sub-class  as  far  as  its  usefulness  in  clause  determination 
is  oonoexTied,  is  that  to  which  all  verbs  that  oan  take  a  full  clause  as 
their  object  belong.  Bbcamid.es  include  verbs  like  believe,  hone  and  notice 
(I  believe  that  he  win  anwe  tomorrow.  I  hone  aha  pat  first  of 

these  examid.es  suggests  another  rule  (thouf^  in  point  of  fact  both  that  he 
and  hone  -f  sto  -  (verb  eub.leot  pronoun)  -  are  absolute  breakers  so  there  is 


no  rodiitribution  situation  la  thoso  sontsnoos).  If  la  a  rodistrlbution 
situation  tho  loft  hand  Tort)  balongs  ta  thia  shb-elass  and  asaiuharo 
batnson  tho  tue  rarbs  tharo  ooonrs  tha  itsa  ^  dlaasa  braak  aoears 


thava 


Gabtala  bqportant  points  ariaa  out  of  this  axsaplo.  Tha  onlj  Infomstlon 


va  naad  about  tha  laft  hand  raxb  is  that  It  bslongs  to  tha  sob-olass  In 
qaastion.  It  doas  not  nattar  to  uhat  othar  sob-elassas  It  also  balongs* 

(Fav  Bntf.lsh  Torbs  fit  Into  only  Mia  snb-elass*  Anj  attsqpt  to  arrsaga 
Taibs  «  or  novas  -  Into  olassas  la  sueh  a  war  that  tha  MaxisMi 
aaount  of  tnfonucbisn  about  thalr  baharlor  is  utlliMd  rasults  la  oonsldarsbla 
oross-olassifioatioa*  Sohasas  ladioatlng  tha  ooablnatiens  of  sub-olass 
BsribarShlp  of  vasbs  sad  novas  now  raoognisad  hgr  tha  look-up  routiaa  will  ba 
found  la  Appsndlx  F}*  As  long  as  tha  look-up  routlna  raraals  that  tha  rorb  in 
qaastlon  eaa  fvnotlon  la  this  uvar  tha  nils  sppllas* 

But  notloa  that  in  tha  oasa  of  a  ssatsnoo  Ilka  I  notload  that  saok 
f Iliad  with  f^onr  our  prasant  rula  oonfUots  with  tha  prarlous  ona*  It 
produeas  a  wrong  answor  wharaas  tha  prawlous  rula  would  prodnoo  a  rl^it 
answur*  As  was  tha  oasa  with  tha  asbignitgr  rulas  tha  rulas  for  olausa^ 
division  Bust  ba  ordarad*  It  is  assaatlsl  that  our  first  rula  Is  qppUad 


bafora  this  ona* 

A  oorraot  solution  oan  oftan  ba  astablishad  on  the  avldanoa  of  tansa 
Sion  without  any  use  of  sub-olass  iaforaation*  Taka  tho  sontonoa 
onaalaa  tha  door  works  la  py  off ioa*  In  any  rodistribotion  situation  whara 
tha  laft  hand  voxb  is  a  prasant  portioipla  and  tha  ri|d>t  hand  varb  is  prasant 
tansa  tha  braak  anst  oosm  laasdiataly  la  front  of  tha  right  hand  verb* 
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Notice  that  since  a  present  participle  is  also  an  absolute  brealcer  the 
fUaal  ordering  uiU  be  dflar/eerks  ia  sar  offioe.  In  these 

oases  provision  mst  be  nade  asaoeiating  the  ti»  pasrts  of  the  interrupted  clause. 

In  certain  redLstributlon  situatlsns  a  solution  oan  be  readied  with¬ 
out  reference  to  the  rerbs  at  all*  The  sinplest  case  of  this  kind  occurs 
when  the  two  Torbs  are  only  separated  by  adwerbs  or  prepositions,  as  in. 

He  dined  well/read  a  book/  and  went  to  bed  early  (note  +  vert  is  an 
absolute  breaker)*  Notice  that  in  this  ease  the  rule  nnst  stipulate  that 
only  these  items  occur  between  the  verts*  The  oeonrenoe  of  a  noun,  for 
exanple,  would  produce  wrong  answers  when  the  left  hand  vert  bhlongad 
to  the  sub-class  of  verbs  Which  oan  take  a  olase  as  their  object*  iDi  order 
to  keep  the  roles  as  general  as  possible  nany  of  than  refer  to  items  whose 
presence  makes  the  rule  inapplicable*  m  each  case  the  situation  produced 
by  the  presence  of  such  an  item  is  resolved  by  a  later,  and  usually  a  general, 
rule* 

Anothe  redistribution  situation  which  oan  be  analysed  without  reference 
to  the  characteristics  of  the  verts  involved  is  illustrated  in  the  sentence. 


It  will  be  noticed  that  if  no  nominal  group  that  oan  be  a  subject  occurs 
between  the  two  verts  (this  is  marked  by  the  sane  criteria  as  object.  Is* 
a  noun  or  element  of  a  nominal  group  must  be  found  not  preceded  by  a  preposi* 
tion) 


re  is  a  problem  here  with  certain  tine  and  space  expressions  which 


are  formed  like  nominal  groups  but  structure  like  adverts,  e*g*  IWL^  a  pint 
nUH  n  He  came  last  suMaer*  He  want  hone*  It  is  hoped  that  this 
problem  oan  be  overcome  by  listing  these  phrases  in  the  dictionary  with 
the  assignment  Noun/ldverb*  k  wort  or  phzmse  with  this  assignnent  would  not 
afford  an  exception  to  this  case* 


than  tha  out  vast  oomo  laaadlataljr  in  front  of  tbs  rl|^  hand  Tsrb,  Ths 
ssntones  Ths  thoutf^-fe  >«*  trsatsd  h|«  aii^ydyan  in  suoh  a  orasl  fashion 
Stoeksd  than  shorn  that  ths  rols  oaa  bs  sxbandad  bgr  naklng  ths  eondltlon  that 
thsrs  should  bs  only  ono  neadaal  group  aftsr  ths  first  prsposltion  found 
ssarehing  froa  rl|^t  to  loft*  Ths  rols  oan  bs  still  fvrthsr  sxbandsd  by 
asking  ths  aors  spsoifio  rsstriotion  that  ths  noainal  group  aftsr  ths  prsposi- 
tion  anst  not  contain  tuo  or  noro  nouns  if  ths  ssoond  is  sithsr  a  plural  noun 
or  an  uneountabls  noun  (lay  vhioh  is  asant  a  noun  ahieh  oan  ooour  jjaasdiatsly 
In  front  of  a  Tsrb  without  sithsr  an  arbiols,  doaonstratirs  adJsotiYS  or 
posssssiTs  adJsotiTs  bsfore  it,  or  a  plural  suffix  s.g.  snvy^.  Ths 

ssntsnos  Ths  thouAt  his  bshavsd  In  suoh  a 

m  will  nov  bs  oorrsotly  handlsd  by  this  ruls.  Ths  now  rsstriotion  prsrsnts 
ths  ssoond  noun  being  taken  as  a  nom^nsl  Ixi  oase  of  a 

ssntsnos  like  Ha  talked  In  a  aannsr  nan  tandsd  to  diollks  slnos  ths  ssoond 
noun  aftsr  ths  proposition  is  plural  ths  rule  is  Inapplioable. 

This  kind  of  situation  is  dsalt  with  by  another  rule  which  aakes  no  use 
of  vezbal  infomation.  This  rule  lists  all  ths  Juactapositions  of  slsasnts 
indicating  ths  ooinoldsnos  of  two  noainal  grotqas  axxl  ooaaunds  that  a  olauss 
boundary  by  aarksd  bstwssn  thsau  The  ssquenoe  indefinite  article  o  singular 
noun  plural  noun  idiioh  occurs  in  ths  ssntsnos  abov^s  is  Just  one  of  many 

I 

sxaqplss.  Others  include  noun  +  subject  pronouxi  -  He  saw  the  book  he  wanted. 

noun  '*■  definite  or  indsf inits  article  •  He  saw  ths  book  the  girl  wanted,  plural 

noun  '(■  noun  -  He  saw  ths  books  John  wanted*  Kotioe  that  these  oombinations 

cannot  bs  aads  absolute  breakers  bsoauss  they  only  nark  ths  ooourrenos  of  a 

olauss  division  uhen  they  ooour  in  a  i  edistribution  situation.  In  oases  like 

In  ths  garden  tbs  flowers  were  blooning  there  is,  of  Course,  no  clause  break 

between  ths  noun  and  ths  article.  An  absolute  breaker,  on  the  other  hand  always 
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Indieatat  a  elaaM  dlTlalon,  Hotlea  also  that  nona  of  thasa  eoaibliiatlons  oan 
ba  takan  as  dlacBoatie  if  tba  laft  hand  Tart)  balongs  to  tha  ahb-olaas  of  1 
objaot  rarbs  (alta.  brlaa.  alaat.  ato.)  and  tha  rifht  hand  rarb  is  transltlra 
and  haa  an  -ad  suffix*  In  all  thasa  oasaa  thara  saans  ta  ba  an  irraduolbla 
asmtaotlaal  anblguitj.  Coaparat 

Ha  ahoaad  tha  group  tha  oparatora/aSlaetad 
and 

Ha  abousd  tha  group/tha  oparatora  aalaetad. 

Hara  ona  naada  a  rula  ahieh  atataa  that  undar  thasa  olraunstanoas  tm 
analyaaa  nust  ba  proTldadi  thou^  it  la  still  doubtful  Nhathar  this  dlsttno- 
tlon  is  naeassary  for  laformatlon-xuitrbral  porposas* 

In  sons  oaaaa  It  la  naeassary  to  look  In  front  of  tha  radlatrlbatlen 
situation.  If  tha  laft  hand  Tsrb  has  oartaln  abaoluta  braakars  such  as  or 
uhioh  hanadlatUly  in  front  of  it  than  tha  braak  eoaMS  hnsadlataiy  In  front 
of  tha  right  hand  vsrb.  Tho  mobmo  ufao  bought  this  houaa  last  raoantly 

This  rula  uRiat  oona  aftar  tha  mla  diaeusaad  in  tha  praoading  para¬ 
graph  if  tha  rontina  is  to  handla  oorraotly  santanoss  Ilka  Tha  woaMn.  who 
nattad  tha  doa  tha  littla  b^y  ^  «rt.th  hip,  was  rarr  fond  of  aniaale. 

So  far  all  tha  radistribution  situations  ua  hara  disousssd  arisa  Mhan  a 
olausa  dlTislon  is  sou^t  batwaan  tao  verbs.  This  is  a  result  of  the  daolsion  to 
adopt  tha  oparational  daflnltion  of  a  olausa  as  that  part  of  a  santsnos  con¬ 
taining  ona  and  only  ona  finite  verb.  There  are,  of  ooursa,  clansas  Which  do 
not  oontain  vartis.  (Thasa  are  trar’fbnns  by  delation  of  full  f>laus«!3.)  Ixais- 
pLas  Mould  ba.  Tha  nrofsaaor.  i^ila  st^l  a  ronna  Mn.  had  studied  In  nanv 

T^r  riirant  useful.  To  daal  with  thasa 

oases  we  add  to  tha  radistribution  rules  two  spaoial  rules  i^illad  before  all 

others.  Tha  first  states  that  if  batwaan  ah  absolute  breaker  and  a  verb,  there 

oooura  one  and  only  one  ooam  than  a  olausa  dlrlsion  is  to  ba  narked  as  ot  Ing 

at  that  oonna.  Tha  second  states  that  if  between  an  absolute  breaker  and  a  verb 
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tlMN  eoeors  an  adjaetira  bat  no  neon  than  a  braak  ooaara  bafara  tha  rli^t 
raxb.  Probably  nera  rolaa  vtU  naad  to  ba  addad  to  thla  group* 

Tha  rolaa  ara  atrootorad  la  aaeh  a  nay  aa  to  aaka  than  aa  flaxibla  as 
poaalbla*  laeh  oontalna  tha  of  infomation  naadad  to  aatahllA 

tha  axlatanoa  of  a  oontaxt  la  ahi^  u  portioolar  daetalaa  aaa  ba  nada*  Tha 
only  othar  Infomation  giran  la  a  liat  of  theaa  hlnaanta  vliaaa  praaanoa 
ehaagaa  tha  oontaxt.  A  aohonatie  ropraaantatlon  of  tha  rolaa  la  poraaantad  in 
iwmadiz  P.  Tha  liat  ia  net  yat  oonplata*  Tbara  ara  too  outatanding  enla. 
aiona*  Flrat,  in  tha  praoant  rolaa  no  aeooont  ia  takan  of  tha  apaeial  elroon. 
otaneaa  ariaing  ohon  tha  laft  varb  ia  a  two  objaet  oarb  axoapt  to  nark  thoaa 
oaaaa  lAara  ita  oeooraoea  nakaa  ona  of  tha  praaant  rolaa  Inapplleabla.  Saoond, 
tha  rolaa  praaantad  hara  ara  net  daalgnad  to  eorar  thoaa  oaaaa  idiara  olonaas 
ara  joinad  togathar  by  eonJonotioM  like  ar  Oonaldarabla  porablnaa 
ariaa  in  thaaa  oaaaa  ainea  thaaa  oonjonetlona  ean  Join  aithar  too  worda  or  two 
olanaaa.  Cf.  I  flrffff  IBtf  I  M.  ^ja^a 

Moat  of  thaaa  hoxa  bam  aolrod  and  rolaa  that  aatabllah  whioh 

fonotion  tha  •aiilllohbtlHM.a  falfiUing  (and  idun  it  oan  fulfil  both  aa  in 

a 

netioad  t|«  fH|i^  tha  fcirl  aao  no)  dram  op*  Thia  oork  houarar,  ia  not  yat 
raady  for  poblieation* 


m 

1.  TMb  4  RwttHtMl  fffll  tw 

Th«  bMie  •trttotttr*  of  fUX  it  roprottntod  in  tho  foUoning  dirngntu 


*1 

MJooi  ^uliflort 
of 


®3  ••• 

<M.if Itrt  Toit  mr  flMUfltni  <|Mllf lort 

of  8^  prodieotod  of  at 

•djoetlTO 


Oentidor  t  aitndud  tontonoo,  itkleh  otn  tako  eno  of  tlio  following  forms 


h 

*3 

'l 

Torb 

Iona 

(ebj.) 

Idvotb 

'3 

MjMtm 

Idfotb 

Boon 

IdJootiTO  Idwotb 

Mofon 

(prod.)  ddjoetivo 

mlwwrU 

ld>OtiTO 

(prod.) 

loto,  In  putlng,  that  ^  oohoto  at  ahown  it  inotagpltto,  tlnoo  an 
•drarb  oan  be  gotllflod  hgr  an  advocb,  idd.oh  magr  In  torn  bo  qaallfiad  bj  a 


forthor  adwosb*  This  la  aoeoomodatod  In  the  natoral  mj  hgr  ontoxing,  aa 
naoal,  the  now  qaallflor  in  tha  ocUmn  to  the  rlg^tt  of  tbo  word  wbith  it 
qmalif ioa,  Thnt  tha  following  aontonoo  ia  oaaaotlaUj  of  tho  atandaxd  forms 
a&  <iaito  OKooptionaUy  drank  man  of  f orod  a  Tiblaiit  affront  to  a  roaUj 
atavandootlj  aiowlj  mewing  enloekor  and  ia  writtans 


*1  *2  ®3  **3  ^5  ^6 

Ran  drank  wceaptlonally  quite  offered  affront  violent  aoving  dloidy  atapendoualy 

onlooker 


really 


Wa  nov  deal  with  olanaea  and  phraaes,  aooordlng  to  the  following  ralea: 

For  a  given  olaiae  aaoertain  the  FUBZ  rating  of  the  word  for  whioh  It 
aobatltatea,  Thua,  in  the  aentet^e  *  to  be  quite  exeeptlonally  drunk  off era 
an  affront,  Whioh  oan  searcoly  je  overlooked,  to  what  wo  are  pleaaed  to  oall 
the  Moral  fabric  of  our  way  oi  lite,"  r.ue  olaaae  "to  be  quite  exceptionally 
drank*  aobatltutea  for  a  noun  standing  as  subject  of  the  aentenoe.  Thla  noon 
would  have  been  rated  S^«  k  second  olanse  "Whioh  oan  searoely  .  be  ovezlooked* 
standings  in  the  sane  relation  to  "affront”  as  does  the  adjective  violent  rated 
P|  in  the  earlier  aentenoe.  Finally,  the  clause  "what  we  are  pleased  to  oall 
the  Moral  fabric  of  our  way  of  life”  substitutes  for  a  noun  (ef,  "onlooker"  in 
tte  eailler  aentence  ’  which  would  z^oelve  the  rating 

lapreseoting  the  FIKX  rating  of  a  word  as  where  X  stands  for  8  or  P 
and  n  stands  for  the  noaerloal  suffix,  sinter  the  olause  aooordlng  to  the  following 


fomati 


loan  (mbj.) 


Adjoottvo  Mvorb 
loon  (ob J* )  HJoetlTO 


h*i  h*'* 


▼orb 

loon  (prod.) 
IdJoetlTo  (prod.) 


Advoxb 

MJoetivo  AdTorb 
Idrozb 


OoiVorioan  with  iho  otondord  orranfiMBt  on  pofo  10  oil!  aoko  eloar 
that  tho  loading  words  in  tho  olaaao  (1.0.  thooo  ohlsh  would  bo  ratod  or 
if  tho  elanoo  woro  a  aain  olaaoo)  haro  alnpilj  boon  fioon  tho  rating  dao 
to  the  word  for  ohidi  tho  elanao  sobstltatos.  and  that  tho  ronalning  words 
haro  boon  anohorod  to  those  in  proeisoljr  the  saaM  fashion  as  was  done  for 
tho  standard  sontoneo. 

Tho  troatsont  of  phrases  is  siapler.  the  fojhanla  being 

h** 

loon  IdtJoetiTO  Idrorb 

Tho  sontonoo  giron  earlier  appears  in  FUS  as  fellooot 


Si  Sj  83  P3  P,  ?,  P^ 

drank  oxooptionaUy  qalto  offmrs  affront  ororloekod  soareolj 

wbl  nloasod 

Sbrie 


noral  our 
WBjr  life 


Tho  prooodoro  as  dosoribod  is  saffiolsutljr  gMiorsl  to  soctond  in  an  obvious 
fashion  to  olaosos  within  olaosos.  phrases  within  Qlaasos  within  phrases,  etc. 
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Datayp4«^tion  of  th«  r>lwne«  of  a  stow  »mt«no»  to  th»  Innat  smtanoe. 


This  Is  based  on  wrd-by-onrd  matching.  The  goodness  of  matoh  between 
a  store  word  and  an  Input  word  Is  treated  as  a  function  of  two  Tariablest 

(1)  similarity  of  FLEX  rating, 

(2)  semantic  projdmity. 

To  illustrate  the  use  of  criterion  (1),  consider  the  two  sentences 

A,  Socrates  as  a  young  n^  to  Athens. 

B,  A  young  nan  from  Athens  ran  Socrates  into  debt. 

The  words  belonging  to  the  major  form  classes,  io,  those  to  be  subjected  to 
the  matching  process,  have  been  italiciMd,  In  five  eases  out  of  six  there 
is  perfect  matching  both  for  form  class  and  for  somantio  content,  let  the 
mutual  relevance  of  the  two  sentences  is  practically  nil,  and  should  becosM 
clear  as  soon  as  any  kind  of  linguistic  analysis  is  applied. 

The  FLEX  versions  ere  as  follcwyt 


Si 

^2 

=3 

A,  Socrates 

man 

young 

ran 

Athena 

B.  Man 

young 

Socrates 

Athens 

aebt 

Successful  use  of  FLEX  depends  upon  the  numerical  system  Whereby  the 
FUX  ratings  of  the  two  words  being  matched  are  made  to  contribute  to  the 
overall  soore  for  goodness  of  matoh.  The  system  gives  wei^t  to  the  Importance 
of  the  words  in  relation  to  the  rest  of  their  sentenoeei  a  perfect  matoh 

I 

between  words  which  are  making  an  unimportant  oontribution  to  thalr  aentonoeo 
will  in  general  have  leas  bearing  on  sentenoo-rolevance  than  an  imporfoot 
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natoh  batman  two  ijiportant  terda,  a,g*  the  tao  aubjeota,  or  the  twa  nain 
▼exba,  FLBZ  giraa  a  rough  guide,  the  oategorlea  being  ranked  for  inportanoe 
aa  foUowa: 

®1  *2  ®3 . \  ^2  V . 

The  ajatan  ae  propoae  giraa  due  aelght  not  only  to  laqportanoe  In  Itaelf  but 
alao  to  degree  of  oroaa-oorreapondenoe  betaeen  the  FLEX  ratlnga  of  the  tvo 
aorda.  Thia  la  dioan  in  detail  in  a  later  aeotlon. 

SguBl^LSJtt&Syad* 

Our  baala  of  aeaantlo  aatdxlng  la  the  uae  of  a  theaaurua.  97  a 
theaaurua  ae  ainply  naan  a  llat  of  oluatera  of  aorda,  eaoh  eluater  bearing 
an  index  nonber.  fforda  are  asalgned  to  the  same  oluater  If  their  autual 
aenantlo  relationahlp  exoeeda  sone  threahold*  k  given  aord  may,  and  Indeed  / 
often  does,  ap^ar  in  many  different  oluaters,  reflertlng  the  variety  of 
ahadea  of  meaning  idileh  one  and  the  aame  aord  may  bear  ahen  used  in  different 
oontexta,  2n  eoctrame  example  mi^it  be  the  aord  induotlon  ahioh  has  teohnloal 
meanlnga  in  auoh  diverse  fields  as  eleotrenagnetian,  biology,  and  loglo,  in 
addition  to  a  range  of  more  loosely  defined  connotations  ahen  used  in 
ordinary  speedh.  Ideally  the  next  phase  of  the  project  should  include  the 
construction  of  a  thesaurus  suitable  for  modem  information-retrieval. 

Proposed  methods  for  constructing  initio  a  more  suitable  thesaurus  will 
be  esqdained  in  a  later  seotion.  For  testing  the  txumerieal  procedures  under 
dlsousslon  ae  have  been  oontmt  to  use  Boast  *e  Thesaurus. 

Tbs  basis  of  our  approach  to  smaantic  matching  is  the  principle  that 
the  more  olosSly  related  semantloally  tao  words  are,  and  the  more  frequent 
the  oooasions  on  lAiioh  they  can  be  used  intertiumgeably,  the  greater  the 
number  of  oluaters  ahioh  they  alU  have  in  common  In  any  rationally  eonstri'  d 
thesaurus.  Thus,  the  aord  diet  has  three  clusters  listed  in  Roget*s  index. 
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with  Indax  madMrs  298,  662,  tni  696,  aad  th«  wrd  natrltlBnal  hM  four  elastars 
with  iMiM  naalMrs  298,  662,  656,  and  707.  (In  ualng  tha  thaaanraa  wa  traat  all 
deiriratlTas  of  tha  aam  root,  a«K,  natrltlonal*  nutrition,  antrltlonf. 
rntrlttya.  nutrlapt.  ato,  aa  baing  Tarlanta  of  ona  and  tha  aana  «ard#)  Thara 
ara  two  dlustara  in  oonnon,  Indloatlng  a  rathar  hl|^  dagraa  of  orarlap  In  rlav  of 
tha  famaas  of  tha  total  nnabar  of  eluatara  Involrad,  A  aarrloaabla  aaaaara  of 
aaaantle  oorraiatlon  baaad  on  elnatar  orarlap,  la  glvan  bjjr  tha  fonmla 


lAiara  la  the  noabar  of  dlnatara  oowaon  to  both  woria,  la  tha  noabar 
of  olnabora  Indaxad  ondar  word  a  and  n^^  la  tha  nmbar  nndar  word  b*  Tha 
total  noabar  of  oluatara  Involrad  la  autooatloallj  allavod  for*  Apfdjrlng 
tha  axpraaalon  to  tha  axa^pla  juat  glTon,  wa  hara 


2 


Kota  that  tha  highaat  ralna  attalnabla  la  1*00,  Indloatlng  that  tha  two 
worda  ahara  all  thoir  olaatara,  and  that  tha  lowaat  ralna  la  aaro* 

91  furtfrwft  m  9rt^frtt  ilaw* 

Wa  ara  now  In  a  poaltlon  to  illaatrata  tha  naa  of  tha  aboro  aaaaara  In 
an  aetnal  oo^parlaon  of  aentenoaa  In  a  atorod  tart  with  tha  liqpat  aantanoa* 

Wa  aappoaa  that  thara  ara  praaant  in  tha  atorad  tart  tha  following  two  aantonoaat 
1*  Tagatablaa  la  aaltably  praparod  ean  prorlda  all  tha  natrltlonal  oonatltnanta 
naoaaaazy  for  growth* 

2*  It  la  aaaontlal  to  anaora  tranaport  faoilltlaa  for  tha  aapply  of  oonvonanta 
to  tha  inatallation* 

Lot  ua  oonaidar  tha  following  Input  aantanoax 
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Do  i^ants  sappily  the  essential  ooaponents  of  a  balanoed  dletf 
The  ruu  versions  of  these  three  sentences  are  given  b«lev« 


*l 

I.  Tegetables  prejpared 

»3 

saltably 

h 

provide 

oonstltoents 

natritlonal 

neoessary 

P4  P5 

growth 

2. 

ensare 

faoUitles 

transport 

supplying 

eo^Mnonts  Instal. 

latlon 

Inpat  t  Plants 

sapjOy 

oonponants 

essential 

diet 

balanoed 

Wb  non  prepare  a  sentenoe  oorrelatlon  table,  preserving  the  FLKK  fonaat,  as 


in  table  1, 


Si 

?1 

**2 

*3 

"3 

plants 

sup)^ 

components 

essential 

diet 

balanced 

®1 

vegetables 

0.17 

0 

0 

0 

0 

0 

^2 

prepared 

0 

0 

0.09 

0 

,  0 

0 

=3 

suitably 

0 

0 

0 

0 

0 

0 

''l 

provide 

0 

0.20 

0.06 

0 

0.14 

'  0 

^2 

constituents 

0 

0 

0.23 

0.13 

0 

0 

nutritional 

0 

0.20 

0 

0 

0.58 

0^ 

'3 

necessary 

0 

0 

0 

0.17 

0 

0 

growth 

0.15 

0 

0.09 

0 

0 

0 

ensure 

0 

0 

0 

0 

0 

0 

h 

facilities 

0 

0 

0 

0 

0 

0.14 

**3 

transport 

0 

0 

0 

0 

0 

0 

supplying 

0 

1.00 

0.09 

0 

0 

0.12 

components 

0 

0.09 

1.00 

0 

0 

0 

Installation 

0.15 

0 

0 

0 

0 

0 

* 

0.17  +0.20 

+  i  . 

.  ,  »  0.32 

Table  1.  Upper  eentenoe. 

semantic  correlation  b 

7  8x6 

iW 

Lower  sentence. 

semantic 

correlation  « 

0.14  +  1.00 

y  'rrz 

m 

jtJL  =  0.^3 
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The  semantlo  correlations  between  the  words  of  the  Input  sentence  and  the 
words  of  the  store  sentence,  calculated  according  to  the  formula  previously 
given,  are  entered  In  the  corresponding  cells  of  the  table.  It  will  be 
noted  that  the  sum  of  the  Individual  correlations  is  scaled  down  in  the 
final  expression,  by  a  factor  derived  from  the  total  number  of  words  involved, 
thus  allowing  for  the  influence  of  sentence  len^^fi  on  the  expected  number 
of  matches. 

This  example  was  concocted  in  order  to  illustrate  the  nature  of  the 
limitations  of  a  retrieval  method  based  purely  on  semantic  correlation. 
Sentence  2  of  the  stored  text  was  delitierately  designed  to  show  heavy 
semantic  overlap  with  words  of  the  input  sentence,  in  spite  of  bearing 
little  or  no  relevance  to  it.  This  is  reflected  in  the  final  score  of  0.43 
as  compared  with  0.32.  Using  semantic  criteria  alone,  the  wrong  sentence 
would  be  retrieved.  Information  retrieval  systems  which  depend  purely  on 
semantic  matching  without  recourse  to  ^tactic  analysis,  have  to  accept  an 
irreducible  load  of  such  errors  as  a  basic  limitation  of  their  approach. 

The  use  of  FLEX  as  a  simplified  syntactic  system  offers  at  least  a  partial 
remedy  for  this  shortcoming,  as  will  now  be  illustrated  by  the  introduction 
of  FLEX  correlation  into  the  foregoing  worked  example. 

Numerical  use  of  FLEX  for  measurtng  relevance 

Before  proceeding  to  detailed  illustration,  one  or  two  comments  on 
points  of  detail  are  called  for. 

(1)  It  may  seem  surprising  that  in  addition  to  the  disappearance  from 
the  FLEX  format  of  subsidiary  words  such  as  prepositions,  connectives,  etc., 
the  entire  clause  ^  ^  essential  has  vanished  from  sentence  2,  and  the  noun 
clause,  of  which  this  was  the  predicate,  now  appears  in  the  predicate  section 
of  the  FLEX  format.  This  is  a  consequence  of  one  of  several  sophistications 
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vfith  which  FLEX  has  recently  been  endowed,  following  from  the  discovery  that 
certain  ayntaotio  patterns  mark  "dummy*  constructions  which  are  regularly 
used  as  equlvalen'c  to  an  active  statement  with  absent  or  vaguely  defined 
subjects. 

(2)  The  FLEX  version  of  sentence  2  has  wrongly  assigned  the  word 
Install fttiftfi  as  a  modifier  of  components  Instead  of  supplying.  This  arises 
from  an  imprecision  in  our  current  routines  for  assigning  word  groups  which 
we  hope  to  r«sedy. 

In  constructing  numerical  measures  for  the  goodness  of  match  between 
one  FLEX  category  and  another  we  have  been  guided  by  two  principles,  both 
of  which  must  receive  due  expression  in  the  final  measun  of  oorralation. 
They  are, 

(1)  The  relative  importance  of  the  category.  Words  appearing,  for 
example,  in  or  are  likely  to  contribute  a  much  greater  weight  to  the 
relevance  of  the  sentence  to  some  other  sentence  than  words  appearing,  for 
example,  in  P^. 

(2)  The  degree  of  coirespondence  between  the  categories  of  the  two 
words  which  are  being  compared.  Thus  if  we  are  comparing  a  word  in  P^  with 
a  word  in  S^,  the  final  score  for  the  match  should  plainly  be  less  than  that 
derived  from  a  comparison  between  a  word  in  P^  with  a  word  in  P^,  or  from  an 
3^  •  S2  match.  In  the  same  way,  let  us  contrast  a  P^  -  P^  match  with  a 

?2  -  P2  match:  although  the  average  level  of  importance  is  the  same  in  the 
two  oases,  it  is  obvious  that  the  latter  ^ould  receive  the  higher  score. 

The  numerical  values  at  which  we  finally  arrived  are  those  set  out  in 
table  2. 
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^1 

1.00 

0.63 

0.34 

0.18 

0.09 

0.71 

0.47 

etc. 

^2 

0.63 

0.50 

0.32 

0.17 

0.09 

0.47 

etc. 

0.34 

0.32 

0.25 

0.16 

0.09 

0.28 

etc. 

0.18 

0.17 

0.16 

0.13 

0.08 

0.16 

etc. 

=5 

0.09 

0.09 

0.09 

0.08 

0.06 

0.08 

etc. 

0.71 

0.47 

0.28 

0.16 

0.08 

1.00 

etc. 

^2 

0.47 

0.35 

0.24 

0.14 

0.08 

0.63 

0.50 

etc. 

^3 

0.28 

0.24 

0.18 

0.12 

0.07 

0.3^ 

0.32 

0.25 

etc. 

P4 

0.16 

0.14 

0.12 

0.09 

0.06 

0.18 

0.17 

0.16 

0.13 

etc. 

'5 

0.08 

0.08 

0.07 

0.06 

0.04 

0.09 

0.09 

0.09 

0.08 

0.0( 

Si 

S2 

’3 

S4 

’5 

h 

^2 

Table  2.  FLEX  oorrelations  (Rote  that  the  upper  right  comer  duplicates 
the  lower  left,  and  lower  right  equals  upper  left.) 
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They  were  derived  from  a  geometric  model  which  Is  appended  In  figure  1. 

The  quantities  shown  In  table  2  are  applied  as  multipliers  In  the  cor¬ 
relation  table,  as  shown  In  table  3.  The  products  are  summed  and  converted 
Into  revised  scores.  It  will  be  observed  that  the  relative  ordering  of,  the 
two  sentences  In  the  stored  text  with  respect  to  relevance  has  been  reversed. 

In  other  words,  the  high  level  of  fortuitous  semantic  overlap  between  the 
Input  sentence  and  the  "wrong"  sentence  of  the  stored  text  has  been  efflol- 
ently  counteracted  by  the  use  of  FLEX  correlation.  We  have  not  yet  performed 
the  extensive  texts  on  specimen  passages  of  stored  text  which  will  be  necessary 
before  we  can  say  just  how  effective  and  versatile  the  method  will  prove  to 
be,  but  the  preliminary  Indications  have  been  highly  encouraging.  Various 
Improvements  have  already  suggested  thmselves,  but  It  would  take  us  too  far 
Into  technicalities  tc  detail  them  here.  (See  Appendix  XIX  -  Concerning  the 
Hatching  Formulae), 
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1,  OMpttrle  mxM.  fFm  which  the  ?X£X  ecnnrclations  were  constructed. 

$1,  stc,  rfprnsfnt  words  of  stored  text,  end  S'^  P*^  etc.  represent 

input  words,  Ihs  nsdii  of  susdossive  oireles  are  Xt  2,  U|'  8,  l6.*.eto,  • 

the  Pin  eorrolation  is  ^tren  by  the  rediprpoal  of  the  distance  between  any  two 


points. 
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^1 

plants 

^1 

supply 

^2 

components 

essential 

diet 

P4 

balanced 

vegetables 

0.17x1.00 

=0.17 

0 

0 

0 

0 

0 

Sg  prepared 

0 

0 

o 
"  o 

o  o 

• 

KJ\ 

0 

0 

0 

suitably 

0 

0 

0 

0 

0 

0 

provide 

0 

0.20x1.00 
=  0.20 

0.06x0.63 
s»  0.04 

0 

0.14x0.34 
=  0.05 

0 

Pg  constituents 

0 

0 

0.23x0.50 
=  0.11 

0.13x0.32 
=  0.04 

0 

0 

nutritional 

0 

0.20x0.34 
=  0.07 

0 

0 

0.58x0.25 
=  0.15 

0 

P^  necessary 

0 

0 

0 

0.17x0.25 
=  0.04 

0 

0 

Pj^  growth 

0.15x0.16 
=  0.02 

0 

0.09x0.17 
=  0102 

0 

0 

0 

P^  ensure 

0 

0 

0 

0 

0 

0 

Pg  facilities 

0 

0 

0 

0 

0 

0.14x0.17 
=  0.02 

P^  transport 

0 

0 

0 

0 

0 

0 

P^  supplying 

0 

1.00x0.34 
=  0.34 

0.09x0.32 
=  0.03 

0 

0 

0.12x0.16 
»  0.02 

Pj^  components 

0 

0.09x0.16 
=  0.02 

1.00x0.17 
=  0.17 

0 

0 

0 

Fj.  installation 

0.15x0.08 
=  0.01 

0 

0 

0 

0 

0 

Table  3. 


Upper  sentence,  composite  correlation  = 


Lower  sentence,  composite  correlation  = 


0.1?»0.0>... 

/mr 

0.02f0.34-f... 

I hx  6 


0.14 

0.10 
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Fully  meehanlaed  oonstniotlon  of  a  thesanras  suitable  for  nunwrioal  methods. 

Existing  scholarly  compilations  such  as  Rogct*s  are  unsuitable  on 
grounds  of  vocabulary  for  meohaniaed  information  retrieval.  They  also  lack 
any  consistent  quantitative  basis  for  the  assignment  of  words  to  clusters, 
or  for  the  delimitation  of  cluster  boundaries.  More  objective  methods  exist 
whereby  clusters  could  in  principle  be  constructed  on  the  basis  of  a  oommon 
underlying  scale  of  measurement.  These,  however,  take  as  their  starting 
point  our  ability  to  give  semantio  proximities  between  pairs  of  words  at 
least  a  rough-and-ready  ranking  order.  Since  the  meaning  of  words  can  only 
be  defined  in  terms  of  their  significance  to  human  beings,  it  would  seem 
that  we  are  forced  back  upon  our  own  subjective  Judgment,  and  that  the 
mechanization  of  thesaurus-making  is  in  principle  unattainable.  The 
enormous  labor  of  making  and  arranging  Judgaents  of  this  type  has  however 
already  been  done  by  generations  of  lexicographers,  and  there  is  no  reason 
why  the  text  of  an  existing  large  dictionary  should  not  be  used  as  input  for 
a  thesaurus-making  computer  program.  He  have  made  preliminary  hand  tests  of 
an  extremely  simple  procedure  for  ordering  the  semantio  proximities  between 
pairs  of  words,  using  Webster’s  Third  New  International  Dictionary.  The 
method  is  again  based  on  overlaps,  this  time  on  the  degree  of  overlap  between 
the  words  occurring  in  the  definitions  of  the  two  words  concerned.  It  has 
surprised  us  to  find  that  the  results  seem  as  reliable  as  those  achieved  by 
the  subjective  Judgment  of  professional  linguists.  He  envisage  that  the 
fully  mechanized  procedure  would  require  the  following  steps: 

(1)  The  edited  text  of  a  suitable  dictionary  to  be  punched  on  cards, 

(2)  A  computer  program  to  be  written  which  computes  the  overlapping 
between  words  taken  pairwise. 
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(3)  A.  oompater  program  to  be  written,  or  an  existing  one*  utilized, 
which  uses  the  measures  obtained  by  (2)  to  pack  the  words  into  a  "semantic 
space"  ot  minimum  dimensionality  and  to  assign  them  coordinates  within  this 
space, 

(4)  A  computer  program  to  be  written  which  will  "sweep"  the  semantic 
space,  gathering  the  words  up  into  clusters. 

In  order  to  reduce  the  above  procedures  to  easily  manageable  propor¬ 
tions  it  would  be  necessary  to  make  a  prior  coarse  grouping  of  words  into 
superclusters  using  an  existing  thesaurus.  Our  main  retrieval  project  is  in 
no  way  critically  dependent  on  the  preparation  of  such  an  ideal  thesauirus; 
the  proposed  work  can  go  ahead  on  the  basis  of  a  compilation  such  as  Roget  *  s 
until  a  refined  and  improved  version  becomes  available  to  ireplace  it. 


* 

R.  N.  Shepard  of  Bell  Telephone  Laboratories  has  written  a  program 
designed  for  other  purposes  which  could  readily  be  adapted  to  our  needs. 
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2,  Outline  of  Information  Retrieval  Proeess 

A.  Four  tables  are  used. 

(1)  Dictionary  with  thesaurus-oonoordanee.  This  table  consists  ofj  a 
list  of  word  stems  and  phrases.  Against  each  Item  Is  entered  a  string  of 
numbers  referring  to  the  clusters  in  the  thesaurus  In  which  the  word  appears. 
We  refer  to  these  numbers  as  thesaurus  referenoes. 

(2)  Thesaurus  with  text-oonoordanoe.  This  table  oonslsts  of  a  num> 
bered  list  of  clusters.  Against  eaoh  cluster  Is  entered  a  string  of  numbers 
referring  to  the  text  in  the  corpus  containing  a  word  of  the  olurter  having 
high  absolute  inforaation  content.  (See  Appendix  XIX  >  Conoemlng  the 
Matching  Formulae.) 

(3)  Text  to  tape  oonoordanoe. 

(4)  Table  of  FI£X  oorrelatlons  (as  In  iable  2). 

B.  The  corpus  from  which  It  is  desired  to  retrieve  Infoniation  has 
its  FI£X  assignments  and  thesaurus  referenoes  In  eaoh  Item  record. 

C.  The  maohine  assigns  the  proper  FIZZ  labels  to  the  Items  of  the  ques> 
tlon  sentences. 

D.  The  corpus  sentences  are  brought  in  and  matched  with  the  question 
sentence. 

E.  The  answer  paragraphs  with  their  Identification  numbers  are  printed 
out  In  the  order  of  their  matching  scores. 

The  sentence  matohlng  procedure  oould  be  as  follows. 

Each  question  sentence  Item  undergoes  whatever  affix  stripping  is  neees- 
sary  to  find  a  stem  (with  the  same  thesaurus  referenoes)  In  the  dictionary. 
Information  content,  the  number  of  thesaurus  referenoes,  and  the  thesaurus 
referenoes  are  noted  for  eaoh  word. 

A  composite  list  of  thesaurus  entries  Is  made  In  thesaurus  order.  From 
this  list,  the  pertinent  texts  are  listed  In  oor^s  order.  (If  necessary  the 
needed  tape  numbers  oould  be  printed  out  for  the  librarian  to  put  on  the  tape 
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readers.)  The  text  aentenoea  are  then  natehed  word  for  word  with  the  input 
qa^atlpn  and  a  list  pf  paragrapha.  their  identification  numbers,  and  the 
oorre spending  sentence  matching  numbers  then  generated. 

This  list  of  paragraphs  can  then  be  ordered  by  means  of  their  highest 
sentence  matching  numbers  for  printing  ont. 
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IV.  ConolttBlonB  and  Reo<Mm»nd>tlon8 


ft.  On  the  utility  of  logloftUy  struotured  Iftnguftges.  Although  this 
portion  of  the  Investigfttion  wue  never  pushed  intensively,  enough  wfts  done 
to  suggest  the  following!  (1)  thftt  ooding  Inforouition  in  sn  ftrtifloiftl 
languftge  suitftble  for  logloftl  nanipulfttions  is  suitable  for  eertaln  speelal 
kinds  of  data  only,  and  not  for  the  main  mass  of  information  in  most  relatively 
discursive  fields;  (2)  that  in  the  very  restricted  areas  where  it  is  suitable, 
further  investigatlon4ls  warranted. 

b.  On  complete  tape  storage  of  all  data.  Here  it  is  obvious  that 
present  hardware  is  inadequate  to  permit  fast  aooess  to  Information  idiioh  is 
merely  known  to  be  somewhere  in  a  tape  file  containing  material  in  the  order 
of  billions  of  words  of  text.  If  the  hardware  problem  oan  be  solved  (e.g.  by 
improved  photosoopio  discs  or  the  like),  it  may  become  economical  to  develop 
searching  techniques  such  as  those  presented  in  this  report. 

c.  On  machine  "learning”  programs  for  syntax.  EUson's  experiment 
shows  conclusively  that  these  cannot  be  made  to  reach  a  useful  level  of 
aoouraoy  if  the  only  form-class  information  available  is  that  given  in  standard 
published  monolingual  dictionaries  of  English.  However,  if  dictionaries  con¬ 
taining  a  much  more  elaborate  form-class  breakdown  (into  100  or  more  sub- 
classes  Instead  of  eight  or  ten)  ever  become  available,  it  will  probably  be 
well  worth  while  to  redo  the  experiment. 

d.  On  semantic  digital  ooding  of  vocabulary.  Here  again  the  research  * 

was  insufficient  for  firm  conclusions,  but  it  does  appear  that  considerable 
economies  of  storage  might  be  achieved  by  such  means  if  translation  programs 

could  ever  be  rendered  fully  automatic. 
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e.  On  the  value  of  a  randomly  seleeted  oorpue  for  testing.  Here  only 
guesses  are  available;  a  priori  suoh  a  oorpus  ought  to  show  up  any  defleleneles 
in  MT  or  IR  programs,  but  we  have  no  evidence  to  prove  this. 

f.  On  the  ks7-panohlng  bottleneck.  It  Is  obvious  that  no  efficient  ma¬ 
chine  program  Is  possible  without  a  really  fast  and  accurate  print-reading  device. 

g.  On  dictionary  codes.  One  of  the  most  immediate  and  urgent  needs  Is 
a  large  fully  coded  dictionary,  l.e.  one  In  which  every  potentially  useful 
fact  of  syntax  and  oo-oecurrenoe  possibilities  Is  Indicated  for  every  word, 
except  those  for  which  the  full  rattge  may  automatically  be  determined  from  a 
consideration  of  the  affixes  present. 

h.  On  affix  analysis.  As  implied  In  the  preceding  paragraph,  affix 
analysis  routines  have  at  least  two  strong  utilities,  both  of  which  should 
ultljaately  be  used  In  any  efficient  IR  program;  (1)  fcr  reduction  of  the 
dictionary,  both  In  total  entries  and  in  need  for  separate  hand  coding;  (2)  for 
automatic  association  of  semantically  linked  words.  The  present  project  has 
not  fully  exploited  ell^er  of  these  as  yet.  Our  reverse-alphabetlied  dic¬ 
tionary  and  all  other  present  and  future  reverse-alphabetised  dictionaries 
will  be  of  essential  utility  In  further  work  on  such  programs. 

1.  On  FI£X.  Here,  again,  we  have  only  begun.  So  far.  It  appears 
(1)  that  elements  which  are  syntactically  marked  as  heads  are  generally 
also  of  more  infoinatlonal  Inqportanoe,  and  (2)  that  the  grammatical  subject 
has  a  distinctly  different  Informational  role  than  the  grammatical  predicate. 

It  Is  possible,  however,  that  for  some  types  of  transitive  verbs  (including 
phrase-verbs  consisting  of  Intransitive  verb  plus  prepositions  the  object 
Is  Informationally  Indistinguishable  from  a  subject.  For  example,  the 
sentence  "VTe  enjoy  spaghetti”  may  be  equivalent  to  "Spaghetti  pleases  us*. 
Furthermore,  It  Is  obvious  that  FIZZ  cannot  be  put  to  use  until  an  automatic 
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routine  Is  available  for  determining  the  antecedents  of  all  pronouns,  other 
than  interrogative  pronouns  (here,  e.g.,  interrogative  "Who”  will  be  eonsidej^r 
to  be  a  perfect  synonym  for  any  personal  name  or  description).  Other  projects 
have  so  far  not  come  up  with  a  solution  for  this  problem,  but  it  must  be  solved. 

j.  On  semantic  matching.  The  need  for  more  than  Indexes  of  aynonymy 
is  illustrated  by  a  sentence-pair  like  thist  "This  cobbler  does  poor  work” 
and  "The  shoes  made  by  this  man  are  inferior.”  Here  we  need  to  show  that 
"cobbler”  somehow  contains  simultaneously  elements  which  match  both  "shoes” 
and  "made”.  Work  has  hardly  begun  on  this  point,  and  much  more  is  needed. 


OaMftTly  Progreaa  Raport  on  JbitoBation  of  Qenar^  Siaunt^; 
Fred  W,  Houaaholdar  jr. ,  Prlnelpl#  Inveatlgator;  May  1,  I9M, 

Qaytarly  Ra«»rt  on  ^  totomatlon  gf  Oenoral  Samantioai  Fred  W. 
Houaaholdar  jr.  and  John  lyona.  Prlnelpal  Inraatlgatora;  Novamber  30, 

1960. 

3.  Fourth  Quartarly  Ranort  2a  ^  Automation  £f  Oanaral  Samantloat  Fred  W. 
Houaaholaer  jr.  and  Jo^to  Lyon  a .  Pxlnoipal  Inreatlgatora;  February  28, 

1961. 

<taarbwly  Report  sa  totomatio  Language  Analvaiat  Prepared  by  John 
Lyonai  Hay  31,  1961. 

5.  Sixth  yiyterly  Report  Autoyitio  ^guage  Analvalai  Fred  W.  Houaeholder 
jr.  and  Jnes  Peter  Thome,  n?inoipai  tnTenigaMra;  October  3I.  1961. 

6.  Seventh  (fcarterly  ^yrt  an  Jtatoaatlo  Lang^e  tot^alat  Fred  W. 
Houaeholder  jr.  and  Jamea  rater  Thorne^  Prlnolpal  Inveatlgators;  March 

1,  1962. 
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U.  —  File  nwl 


ELeok  2 


.APPEMDIX  IV  —  Anaiyzablij  Unit  Schema 


Unit  Analysis  Rules 


Uote;  I  item  under  consideration. 

P  -  item  preceding  I. 

fl  and  #P  =  unit  numbers  of  I  and  P, 

Tv;o  dashes  are  said  to  be  paired  if  there  are  no 
colons,  semi-colons  betvieen  them  and  no  parenthesis 
(or  an  even  number  of  parenthesis)  between  them, 

0,  In  the  rules  which  follow,  if  I  is  the  1st  item  in  the  sentence  assume 

0 . 

1.  If  I  is  &  left  parenthesis,  #I  =  #P  +  1. 

2.  Tf  ?  is  a  ripht  parenthesis,  #I  =  #P  4l. 

(?.  If  I  i“  xhe  Ind  of  paired  dashes,  #I  =  #P  +  1. 

'1.  If  P  is  the  1st  of  paired  dashes,  #1  =  #P  -  1. 

5.  0th'-3'\:i  take  =  ^fP, 


Pictui*es 


I 

L  #P  +  1 

P  - 

) 

C  =  #P  -  1 

I  = 

2nd  of 

paired  dashes 

It 

#P  +  1 

P  = 

1st  of 

paired  dashes 

#I  = 

#P  -  1 

#I  ^ 

=  i?P. 

-pa. 


APPENDIX  —  Phrasiflcation  Plow 


APPENDIX  VI  —  Form  Subclass  Ced«8 


Noon  Subclass  Codes 


If  an  item  belongs  to  the  noun  fozn  (^ass,  then  03»  1|)«  17i  22,  23, 
and  25  indicate  that  the  item  is  a  non-eountable  item.  The  following 
table  shows  what  noun  subdlasses  are  indicated  by  01,  03-20,  and  22-25* 

An  "X*  in  a  column  opposite  a  number  indicates  that  the  subclass  indicated 
by  that  number  has  the  property  named  at  the  top  of  the  column.  The  lack 
of  an  "X*  indicates  that  ^e  subclass  does  net  have  that  property* 


COONTABIE  llKflQnMTIHr.m  proper  MASCIiLmE  FEMINIMR  NEl 
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APPKKPn  VI  —  (Cont»d) 


Vert)  Subolaee  Codes 


The  following  table  Is  analogous  to  the  one  used  to  explain  noun 
subclasses t 

traris-  b/V  a  transltixeaHP  transitive  Intrans-  copul-  transitive 
itiva  verb  a  two  objects  4to»yerb  Itlve  atlve  ■fad.leetlve 


01 
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APPBroU  VI  — (Oent»d) 


Esaasm  Ss^ 

If  an  item  belongs  to  the  pronotn  fox^  ^ss,  then: 

01  objeot  pronom 

f 

02  a  Itibjeet  pronorm 

03  «  dbjeet/subjeot  pronotm  aatolgnons  Item 
04  a  indefinite  pronoun 
05  a  possessiTO  pronoun 
06  a  refLexivo  pronotm 
07  a  Sa|p.ish  ntmaral 


AdieetiTO  Subclass  Codes 

If  an  item  belongs  to  the  adjeoUro  form  olass,  them 
01  a  indefinite  arbi^e 
02  a  Arabic  numeral 
04  a  definite  artiole  (a  no) 

08  a  posoessire  adjeotlre  (Inoltiding  his) 
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tmx  of  Itaa 
lndieat«d 

1 

nuMLM  dUsslflMtion 
aeeordl^  td  otrUl^ 

j  ■  typa  of  onoartatnty 

ImgtiaLar  fenn 

0  ■  not  IrragvHar  form 

1  ■  ^at  tanaa  othar  than  7  or  8 
Z  »  paat/praaant  aadblguotis 

3*  ^nral  , 

4  «  siagtllMx/^vcnl 

5  ■  ea^paratlTa 

6  ■  si^axlatlTa 

7  a  pa^  partlelpla  oaiy 

8  a  paat  tanaa  only 

Mority  ead* 

1  a  noun 

2  a  TOXb 

4  a  adjaetlra 

0  a  no  priority 

Ibadlvte  braakar  eoda 

1  a  olaaa  B,  braakar 

3  a  daaa  A,  breakar 
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APPENDIX  ~  Affix  Dictionary  Flow 


••00“ 


APPgHPg  —  (Cont«d 


N 


Output 


APPENDIX  * '  —  Indicator  Plow  Chert 
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APPENDIX  XI  —  The  Indicator  Codes 


0.  V,  V/A.  1.21 

1.  plural  (N).  1.33 

2.  Cap.  ^  beginning  of  sent.,  subj.  Pr. ,  1.1,  2.1,  8.1,  +1,  -1,  Z 

3.  V  subclass  05,  08.  17,  4.l6 

4.  singular  countable  (N).  1.42,  1.44,  2.26,  8.10,  8.29 

5.  V,  subject  Pr.  Z 

6.  present  tense  +0.  A,  AC,  Z 

7.  thought .  3t»ke.  +14 

8.  Past  (V)  or  past  V  and  past  participial  adjective.  +25,  -5,  Z 

9.  auxiliary,  modal,  copulative  V.  1.12,  1.24,  2.6,  2.18,  4.1,  4.7,  8.10, 
8.24,  8.26,  +12 

10.  "Believe"  Verbs.  1.30,  1.40 

11.  (N).  -ing.  |.16 

12.  Intransitively.  1.15 

13.  A,  V  »  V  subclass  06,  25,  27,  8.13 

14.  present  V,  auxiliary,  modal,  copulative  V.  2.7 

15.  part  of  appear,  consider.  2.19 

16.  tvro  object  V  1.23,  1.28,  4.16,  8.13,  +21 

17.  V  part  of  begin,  continue .  end,  finish,  open,  stand,  start,  stop.  2.33 

18.  V  part  of  begin,  continue,  help,  keep,  lie,  send,  sit,  stand,  start. 
stop,  try;  worth.  2.40 

19.  consisting,  dying,  sneaking,  talking,  telling,  thinking  2.21 

20.  V  part  of  ascribe,  attach,  attrtfrte.  ^long.  cling,  commit,  convert. 
oppose,  periain.  reconvert.  relaieT~snb.1ect  1.9,  -12 
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21.  (as  nominal  Is)  according .  alien,  amenable .  antagonistic .  attributable . 
basic,  complementary,  contradictory,  convert (s).  equiyalent.  ff. 
foreign,  hostile,  inimical,  liability,  opposite,  preporiion(al) . 
regard; s) .  resistant.  respectCs^.  sensitive,  similar,  supplementary. 

1.9.  -12 

22.  exerciseCs) .  method (s) .  procedure (s).  technique ( s)  as  N’s  2.9 

23.  accident,  aim,  art,  capable .  oertatntv.  custom,  device,  difficulty. 
ease,  effect .  feasibility,  habit,  hope,  hopelessness,  idea,  impossi¬ 
bility.  incapable,  interest,  means,  method,  necessity,  ob.lect .  obli¬ 
gation.  possibility,  practice,  problem,  purpose,  question,  re^t.  mle. 
sake,  and  their  plurals.  2.30 

24.  cure,  device .  facility,  fame,  flair,  machine .  necessity,  need,  notoriety. 
notorious,  reason,  talent  and  their’ plurals.  2.32 

25.  attributable,  attribute; a) (d) .  belong,  opposed,  opposite,  pro Port ion (al). 
similar,  aub.lect^ed).  8.27 

26.  V  ^  (ascribe,  attach,  attribute,  belong,  cling,  commit .  convert .  oppose. 
pertain,  reconvert,  relate .  subject,  -6) 

27.  modal,  I,  we,  you,  they,  who.  -6 

28.  A,  V  =  subclasses  of  trans.  V  -f  to  -f  V. 

29.  (N),  A,  Pr.  2.34 

30.  singular  N.  I.36 

31.  plural  N.  1.21,  1.23,  2.12,  8.12,  +22,  Z 

32.  singular  countable  N.  1.19,  1*20,  1.29,  1.41,  2.22,  4.15,  +18,  +20 

33.  N,  (A)  1.15 

34.  article,  Pr/A,  possessive  N  or  A  2.2 

35.  A,  N,  subject  or  indefinite  Pr.  who,  whoever,  no  one,  it.  you  4.13 
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f  .  *  ■ 


t 


k 

APPENDIX  n  (cQn»t) 

0.  and,  nor,  or 

1.  N,  object  Pr.  jgu,  4'.l6 

2.  possessive  A,  whose.  2.20 

3.  present  V  +s,  P.  -3 

4.  V,  P.  1.18,  8.20,  8.21,  +16 

5.  V,  P,  article.  8.15 

6.  V,  P,  poss.  A.  4.21 

7.  V,  P,  C.  +20 

8.  V,  N.  1.43 

9.  ing  or  past  form  adjective,  1,11,  +20 

10.  D,  A/D,  a  ^  (article,  possessive  A).  XL00K4 

11.  poss.  N,  A,  Pr/A.  1.28 

12.  A  ^  Ing.  1.3,  2.5, 

13.  V,  N,  article,  Pr.  Z 

14.  A,  poss,  Pr. ,  his.  -13 

15.  N,  A.  1.13,  1.20,  2.29,  +25,  Z 

16.  D,  Pr/A.  4.24 

17.  D,  A/D  XLOOK2,  Z 

18.  D,  A/D,  Pt/a  another,  each,  no  one,  one,  this:  few,  many,  several. 
these,  those  or  numeral).  XL0OK3 

19.  down,  except .  like,  near,  till.  -9 

20.  poss.  N  or  A,  article.  1.2,  2,28,  4.4,  8.3,  +2,  -6 

21.  V,  sub.  Pr, ,  it,  Z 

22.  V,  A.  1.9,  +27 

23.  poss,  N  or  A  or  Pr,  whose.  4’,5,  +1? 

24.  V  ^  (modal,  auxiliary),  N,  Pr/A,  P,  obj.  or  Indef,  Pr. ,  Z 
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25.  article,  poss.  A,  Pr/4,  indefinite  fr, ,  whoever,  no  one.  8.13 


26. 


27. 


28. 


29. 


30. 


31. 

32. 


down,  near.  «tin .  well.  -5 
Pr/A.  2.37,  4.21 

art.,  A/Pr. ,  poss.  Pr. ,  his,  whose,  that,  whioh  1.29 
another,  each,  no  one,  pne.  this  1.18,  ■*>15 
Pr/A,  that  -*15,  +16 
Pr/A,  more,  most.  8.15 

another,  each,  no  one,  one,  thist  few,  many,  several,  these.  these^> 
i.l6 


33.  another,  any,  each,  much,  no  one,  one,  other,  such,  this.  8,8 

34.  poss.  or  refl.  or  obj.  Pr. ,  it.  yoti.  8.7 

35.  article,  A/Pr,  indef.  or  refl.  Pr,  more,  most,  very.  +24 
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APPENDIX  H  (con't) 


0.  with,  between  2.31 

1.  that,  which.  4.24 

2.  more,  most.  8.31 

3.  most,  too.  2.25 

4.  comparatively,  strikingly,  too,  very;  8.4 

5.  still,  well.  -7 

6.  art.,  subj.  or  indef.  or  poss.  Pr,  poss.  A,  plural  N,  proper  N,  verb, 
more,  most,  that  Z 

7.  article.  1.29,  2.3,  2.22,  4.2,  4.3,  -3,  -5 

8.  like,  except.  -6 

9.  since,  until  Z 

10.  P  ^  to.  1.8,  2.24,  8.12,  +9,  Z 

11.  P  ^  as.  4.9 

12.  (N),  A.  4.20,  4.25 

13.  symbol,  Arabic  numeral.  1.24 

14.  symbol,  Arabic  or  written  numeral.  4.23 

15.  subj.  or  indef.  or  poss.  Pr,  who,  whoever.  1^,  you.  1.5 

16.  Arabic  or  written  numeral.  1.29,  1.30,  1.33,  +23 

17.  object  or  indefinite  or  poss.  Pr,  whoever,  no  one,  it.  you  1.6 

18.  object  or  indef.  Pr,  no  one.  jqju.  8.19 

19.  object  or  indef.  Pr,  more,  most.  j^u.  2.11 

20.  object  or  indef.  Pr,  no  one.  3[2U.  1.27,  4.10,  +3 

21.  N,  object  or  indefinite  Pr,  JSS.*  iit*  ■*'21 

22.  subj.  or  indef.  Pr,  whoever,  no  one,  it, 

23.  subject  or  indefinite  Pr,  »*£,  whoever,  ^t,  jou.  +5 

24.  N,  A/Pr,  indef.  or  subj.  Pr,  U,  223i« 
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25.  plural  N  or  A/Pr  or  subj  Pr,  you.  AC 

26.  Indef.  or  poss.  Pr,  poss  A,  present  A,  dictionary  A,  suoh.  sense. 
reason.  Z 

27.  Indefinite  Pr,  who,  whoever,  no  one.  2.13,  2.14 

28.  A,  singular  countable  N.  Z 

29.  indef.  Pr,  who,  you.  8.20 

30.  object  or  indefinite  Pr.,  ^t,  you.  2.15 

31.  although,  beoause.  since,  though,  whent  how,  ^f,  until,  where,  whereas. 
whether,  whilst 1  lest,  till,  unless,  whenever,  wherever,  why.  1.25 

32.  although,  beoause.  since,  though,  whent  ^s,  hgw,  however,  ones,  so, 
that,  thus,  what,  whatever,  wherever.  4.19 

33.  after,  before,  besides,  since,  until.  Z 

34.  although,  beoause.  since,  though,  whent  how,  ^f ,  until,  where,  whereas. 
whether,  whilst,  whilst  what. 4,22,  8.5 

35.  after,  before,  slnoe.  until.  Z 


APPENDIX  XLl 


The  Search  Routines 

The  aahlgulty  resolution  routines  require  Information  about  preceding 
Items  and  about  foUovrlng  Items.  Before  such  an  Item  can  be  Interrogated  the 
programmer  must  first  detemine  If  there  exists  an  Item  to  be  Interrogated. 

I  ’ 

Usually  the  programmer  considers  that  there  Is  no  such  Item  in  unit  If 
punctuation  Is  encountered.  Therefore  all  of  the  routines  exit  NO  WORD  POUND 
whenever  punctuation  Intervenes  between  the  Item  with  which  the  search  begins 
search  Item  and  the  Item  looked  for.  For  example,  suppose  we  want  to  know  If 
the  item  In  front  of  the  search  item  is  If  the  item  in  front  of  the 

search  item  is  a  comma  the  exit  is  NO  WORD  FOUND,  no  matter  how  many  words  in 
unit  there  are  preceding  the  item.  However,  the  address  of  the  preceding  (or 
following)  Item  is  always  given,  whether  It  is  a  punctuation  mark  or  not, 

(This  is  to  handle  certain  special  cases.)  If  it  is  punctuation,  the  address 
word  has  a  negative  sign. 

The  calling  sequence  for  each  search  subroutine  has  the  following  foraat, 

TSX  XL^K7,4 
A  PZE  ATNR 

B  PZE  NUF 

C  PZE  *• 

Here  X  s  P  for  searching  left  for  a  preceding  item;  X  3  p  for  searching 
right  for  a  following  item. 

Y  s  1  means  nothing  is  ignored, 

T  s  2  ”  advezhs  and  adjective/adverbs  are  ignored, 

Y  =  3  "  adverts  and  sdjectlve/adverbs  and  pronoun-adjectives 

^  (this,  etc.)  are  ignored, 

Y  =  4  "  adverbs  and  adjective/adverbs,  and  adjectives  ft  (article, 

possessive). 
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For  example V  FL(MK2  looks  for  the  first  following  word  which  is  not 
an  advei^  or  an  adverb/adjeotive. 

Transfer  is  made  indirectly  to  the  address  (say)  "NWF"  in  "B*  if 
NO  WORD  FOUND. 

If  ViORD  FOUND  (i.e.  a  word  which  the  programmer  will  want  to  test) 
transfer  is  made  directly  to  "C"  +  1. 

"C”  is  always  filled  in  by  the  search  routine.  Minus  zero  is  stored 
there  if  there  are  only  ignored  itons  between  the  search  item  and  the  begin* 
ning  or  end  of  the  unit,  depending  whether  a  or  a  FL^K  is  used.  If 

a  punctuation  mark  %a  found,  exit  ^  made  to  "NWF"  and  its  address  ^s 
Placed  in  "C". 

"A"  must  be  filled  in  by  the  programmer.  The  programner  must  store 
in  "A"  the  first  address  of  the  text  item  z^oord.  "A"  will  always  be 
assembled  as  PZE""*. 
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APPENDIX  nn— Typical  Ambiguity  Routlna  Rule  (noWVerb  pr«Mnt  t«nge) 


Ral»  C 

Is  the  amblgaottS  veird  preceded  (ignoring  adv/prep  and  adj/ps^n<>un8)  by  a 
preposition?  If  yes  assign  as  a  noun.  If  no,  or  if  no  word  precedes,  apply 
next  sequential  Inile* 


*PJ  =  PL^0K3  (see  Appendix  XU  -  The  Search  Routines) 
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Pr>aent  T«w  fts^Tition  _ ^  _ 

Hot*  W  ■  word,  P  »  prowlotww^rP  ,■ /«3lo«lJig 
IfBorlnH  tdrorts  and  adroiti-tdjootlwos.  /P/  -  B$m  *■  (P)  Jj**  •I*®  ^ 

adjootlTO-proneims  meopt  tMo.  oaah,  anothor,  ono«  tho*t,tooi*,  asssSmY  S!SZ* 
ftw.  no  ona.  and  ftii^lah  ntoMrala. 


1,  Is  W  and  not  tho  lot  word  in  th*  sontonoot  If  yss,  tak*  W 

as  HODH. 

2,  Is  /?/  an  artlolo,  pessessiwe  H,  or  possesslT*  adjoctiwot  If  yos,  tak* 
W  as  HODH. 

3,  Is  (P)  an  adjoetlT*  4  IngT  If  y**,  tak*  W  as  HOOH, 

4*  Is  (P)  a  nodal  werbT  If  y*s,  doss  W  ha^o  s  as  affix  (final)  T  If  y*s, 
tak*  as  HOOK}  if  no,  tak*  as  VIBS* 

5.  Is  (P)  a  snbjoot,  Indsflnlt*,  or  possosslw*  pronoun,  jon,  it,  *£ 
whoTort  If  y*s,  tak*  as  TBRB. 

6.  Is  (F)  an  objoot,  iadoflnlt*,  or  possosslT*  pronoun,  xon,  Utpo  .ofift* 
tdioowor?  If  y*s,  tak*  as  TIM, 

7.  Is  (F)  »  of?  If  y*a,  take  as  HODH. 


8* 

9. 


Is  (P)  a  proposition  4  i2^  ••  HOOH, 

Is  P  *  toT  If  y*o,  is  W  a+*  (i.*.  has  it  th*  affix  *)T  If  y**,  tak*.  as 
HODH}  iTno,  so*  if  PP  »  (as  nosdnals)  aoowgiag.  jliSS*  , 

anUasnistio.  atty^^ttfabl*, 


tion.  m 

iwsDsets.  sonslti-T*.  siiai^.  or  aupp*^ 
sHfeetly*  is  part  of  asorlbM.  attadh,  ^ 
ooBwit.  oonrort.  0PD0*e.  pomin..  r*eonr*i 
tak*  as  JOOR.  If  no,  tak*  as  VluB,  If  nc 


»,  £,  iSSMe 
bional.  rsgaa 

,  or  BUPfiSIM 

B,  attadh.  a1 


■rrloos  Tozb  or 

IK»  dUQit, 

»  subiodt.  If  y**. 
mu 


10.  Is  (P)  »  toT  If  yes,  is  W  a+s.  If  y»s,  tak*  as  HOOH. 

11.  Is  P  «  that?  If  y*8,  is  (P)P  a  If  F®*.  *■  ^ 

(P)P  an  -iM  ®r  adJoetiwoT  If  y*s,  is  (P)  (P)P  an  asoular^ 

If  y*^tak*  as  HOOH,  If  no,  is  PP  a  proposition?  If  y**,  i*  W  ***T  If 
yes,  tide*  as  VERB}  if  no,  tak*  as  HOW. 

12.  Is  (F)  an  auziliarjr,  andal,  opr  oopulatiT*  rortT  If  yes,  tak*  aa  ipCU. 

13.  Is  (F)  a  past  t*ns*/adj*otlv*T  If  yes,  is  (F)(F)  sonothlng  i*ioh  nay*  be 
a  notn  or  an  adjeotlvoT  If  no,  take  as  HODH. 
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14.  Is  (F)  a  verbT  If  ]r»8.  Is  V  IntransltiTS  as  tsA*  If  yss,  take  as  NOUN. 

15*  Is  (P)  an  intransltlTe  verb?  If  yes,  take  as  IlODi, 

16,  Is  f  a  eandidats  notn  or  an  -inaT  If  yes,  is  (F)F  a  vert>?  If  yes,  take 

as  flOtm. 

17*  Is  (P)  B  to?  If  yes  take  as  HOOi, 

18.  u  (p)  - ss-sae^y^»g>t.tiwgs, 

few?  If  yes.  Is  P(P)  a  weposiuen  or  ve»7  If  yes,  take  as  NODH;  l.f  no, 
uV  a<f8T  If  yes.  Is  (P;  *■  this,  eash.  another,  one,  no  one?  If  yee, 
take  as  vnSt  if  no,  taike  as  if  W  is  not  4-8,  ie  TP<  •  this,  each. 

another,  ooe?  If  yes,  take  ae  KODN.  If  no,  takie  as  VISB. 

19.  Is  P  a  singular  oountable  noun?  If  yes.  are  there  (if  anything)  only 
adverbs,  adveA/adJeotives.  adjectives  artldle,  possessive  adjective), 

;  and  othSF  singular  ^.oimtable'^uns  .between  P  and  this  next  Previous 
punctuation?  If  yes,  take  as  MOON. 

20.  Is  W  a4-8?  If  no.  Is  F  a  singvQnr  eotmtable  noun?  If  yes.  If  FF  a  noun 
or  an  anjeetive?  If  no,  take  as  HOOK. 

21.  J  ^  F  a  plural  noun?  If  yes,  is  (F)F  a  verb  or  vezb/adjective?  If  yes, 
take  aaiiiguous  item  as  HOOK. 

22.  Is  P  a  noun?  If  yes,  is  (ignoring  idjectives  also)  (P)P  a  verb?  If  yes, 
take  W  as  NOQR. 

23.  Is  (P)  a  plural  xwun?  If  yes,  is  Previous  Verb  a  tse-object  verb?  If 
yes,  take  W  as  HODIt  if  no,  take  as  VBBf  if  no  previous  verb,  take  W 
as  FIRB. 

24.  Is  (V)  first  word  in  unit?  If  yes,  has  W  4>8?  If  yes,  take  W  as  NOON; 

if  no,  see  if  F  is  an  Arable  nunsral  or  other  syabcl?  If  yes,  take  W 
as  NOON.  If  no,  is  there  a  Following  Verb?  If  no,  take  W  as  ‘VSEiB; 
if  yes,  see  if  verb  is  auxiliary,  or  oopUlative.  If  yes,  take 

V  as  NOON. 

a 

25*  Is  P  f  althogfa.  though,  because,  how. if.  lest,  unlees.  sinoe.  till. 

until*,  when.  ironBver.  where,  irorever.  wfaettor.*‘idiile.  whilst,  whereas, 
or  why?  If  yes,  iake  W  as  NOOll. 

26.  Is  (F)  an  adjeotive?  If  yes,  is  F(F)  a  preposition?  If  yes,  take 
NOON. 

27.  Is  (P)  an  object  or  indefinite  pronoun  or  no  one?  If  yes,  is  (P)(P)  » 

let?  If  yes,  take  as  nRB. 

28.  Is  (F)  an  adjective,  adjective/pronoun,  or  possessive  noun?  If  yes,  is 
there  a  verb  between  W  and  the  Rtevions  punctuation?  If  yes,,  is  it  a 
t«o»cbjeet  verb?  If  no,  take  as  FBB.  If  there  is  no  verb,  take  W  as 
VBtB. 


29.  I8  (P)  ■»  Bnrilsh  nwBBral?  If  y»««  1®  y®®» 

eowUiae  Sto?  If  y88.  is  PP(P)  an  artld#?  If  ;^*  ■’^*  ** 
y®8,  or  if  P(P)  i8  not  a  singtilar  oomttaWLe  notm,  take  as  HOW. 

30.  Is  (P)  »  that?  If  yes,  is  (F)(P)  *  ^idat^rt?  If  *][•*’* 

«hidi  takas  a  that~<JLansa  as  objaet  (i.e.  a  •beliare  Ttrt)/  If  JO  t 

take  as  VERB. 

31.  If  (P)  an  logLish  nuaeral?  If  yes,  take  W  as  VERB. 

32.  Is  W  eapitaliaed?  If  yes,  take  as  HOOT, 

33.  Is  (P)  an  English  ntaeralT  If  yes.  is  P  a 

P  a  plT»al  nonn  candidate?  If  yes,  take  Jf  4,  . 

a  plSal  noun  candidate?  If  yes  take  as  if  nc,  see  iX  PCP^s  s 

si^ar  countable  noun.  If  no.  take  as  VERB,  if  yes.  take  as  HOW. 

34.  Is  Wfs?  If  no,  is  <P)  an  adjective?  If  yes,  take  as  HOW, 

35.  Is  W  a+s?  If  yes,  is  (P)  a  noun?  If  yes^  is  P(P)  ai  Indefinite  artidle? 

If  7B8,**take  as  VERB. 

36.  Is  W  a+8?  If  yes,  is  P  a  singular  noun?  If  yes,  isF  a  singular  noun? 

If  yes, “take  as  VERB. 

37.  Is  W  first  word  in  unit?  If  5«'P*  take  as  HOW. 

38.  Is  P  an  adjective?  If  yes,  take  as  HOW, 

39.  Is.  P  =>  that?  If  yes,  take  as  VERB, 

w.  1.  <F)  -  If  j...  1.  r(F)  • 

if  no,  see  if  W  is  subclass  07,  08  10-12,  14,  '•9,  2^  ^E*b4^** 

^U^vr"  verts).  If  no.  take  as  HOW.  If  yes.  take  as  VERB. 

41  Tgp«»^"g  everything  but  punctuation  .and  verts  find  Preview  proposiWon. 

’  tF^d  weifms  is  only  Pollw^d  bf  singular  countable  wwns  or 

!SjfcSJ;sT(irtMf.  rtSSsIve  adjectiver  If  F-s.  take  as  HOW. 

42  Is  (P)  a  eonjunctinn?  If  no.  go  to  AHD  CWCORD.  If,P®8.  is  ? 

*  rttion?  If^s,  Jdgh  W  be  a  CWcSd*  ^ 

VERBt  if  no,  .see  priority.  If  no  priority,  go  to  AHD  CCMCORD, 

43,  Is  P  a  conma?  If  yes.  Is  F  a  conaa?  If  yes,  is  PP  a  noun  or  vert? 

If  yes,  take  sans  as  PP, 

44.  Is  P  a  coanst?  If  yes, is  F  a  preposition?  J®  'J 

Qountable  noun  eaislldate?  If  yes,  take  as  VBBj  if  no,  see  priority. 
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COBCGRD 


I.  If  there  is  a  priority^  take  aooordlngly  as  NOOM  or  YBtB, 

n.  Fihd  Preoedlng  rerh,  absolute  breaker,  or  panotuatlen,  or  beginning  of 
tmit,  call  this  (P.6.).  Ignoring  everything  else,  fl^  first  snbJeets'C 
or  preposition  Following  this  (P.B.),  oall  this  S.  If  S.^  is  a  prep¬ 
osition,  take  W  as  NOIJH,  If  S  is  a  stibjeet,  go  to  A,  It  neither  is 
fotnd,  go  to  HI* 

k.  Is  (P,B,)  a  vert?  If  yea,  is  W  a+jT  Tf  •»»*,  is  (F)  a  pltural  nonn 
candidate  T  If  jnis,  take  as  VBBB,  "^If  no  is  (F)  a  oiuidldate  verb? 

If  ysa,  take  as  KOm,  if  no,  go  to  part  A  of  ARD  CQNCGBD, 

1.  Is  (P.B.)  of  subOLass'  08,  09,  12,  19-22,  26-30  (trans#  +  to  + 
vei^?  If  yes  take  as  VWB.  If  no  is  (P.B.)  an  atodliaryT 
If  yes  is  (F)  (P«B»)  an  adjective  with  the  above  verb  subdassesT 
If  yes,  take  as  7IRB,  If  no,  the  AND  CONCORD. 

B.  Does  and,  w,  or  nw  soour  between  (P.B.)  and  WT  If  no,  is  S  a 
pluralT  Ifyes,  la  W  plural  as  noun?  If  yes,  take  as  NOONj  if  no, 
take  as  VERB.  If  S  is  not  a  plural,  see  if  V  is  plural  as  notsi. 

If  yes,  take  W  as  VERBt  if  no,  afs  NOON. 

m.  Take  as  NOON. 


*  subject  =  noun,  subject  pronouns  (^3),  indefinite  proJU'unc  aut-; 
adjeotlve/pronoun. 
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m  CCMCCRD 


la  thara  a  P^oadtng  Taxis*  P.  V.,  la  tlaa  anitt  If  no,  taka  aa  lOOl, 
If  Taa*  la  P*  T,  a  pxaaaiib  tanae  +  f?  If  yaa,  la  V  a^fT  If  paa*  taka  aa 
lOOi;  If  no,  taka  aa  Vlffi, 


iU  Tind  tha  flrat  aobjoet*  S*  Pxaoadljac  P«T. 


!•  If  no  atibjaet  fotmd*  taka  V  aa  HOOI 

2,  If  aobjaot  foond*  lode  for  prapoiaitlan*  P*  Preoadiag  S 
(ignoring  avorythlng  btet  prapoaltlona,  Tozba*  ponotnation, 

and  abaolota  braakara).  If  no  P*  do  oonoord*  with  thla  atdsjaeb, 

3.  If  P  fotnd*  look  for  fix«t  aObjaot  praoadlag  it  and  go  bade 
to  atop  1. 


*ooneord 

plTiral 
non  pioral 
plozal 
non  plural 


anbjaot  -f 
atdojaot  + 
anbjaot  + 
atibjaat  4- 


-t-va-is  WirNOCn 
4-  V  a4i  V  «  vns 
4-  W  not  a4«  V  s  TBS 
4-  V  not  a  7a  V  -  lOOl 
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Present  Tense /Hew  lablttiitles  Piotures 


Hetet  *4-*  M  ijMdiately  next  nerd)  *(4-)  p  nsoci  word  dlsregtrdlnc 
edrexbs  end  adbexb~sdjeQtiTest  */4>/*  <■  seas  as  (4)  but  also 
Ignoring  all  adJeetiye^prenewB  eacoept  this,  ea^  another,  one, 
ete. ;  *1^  «  aidtlgaens  iteat  (  )  arewd  ai^ihing  exoepi  4  aaiois 
■eandiiate  fer*i  *M*  -  ROCit  -  TOBi  »  idjeetlre}  fD"  - 
ADTfRBt  stet  *#*  ■  wit  bewdarjr. 

1.  W  >  eapitaliaed  4  lot  word  in  sentenee^V  ■  HOUR* 


2,  art, 

pess.  R  >  /4/tf  4W  >  HOOH 

poss.  kj 

3.  A  ^  ing  (4)  V4  W  -  ROIH 

A.  Hsdal  V  (41 W  «  a  4£d  W  *  ROOH 
Hodal  V  (4/  W  ^  ».*»  4  V  «>  ROOH 

5,  stJbj,  _ 

indef,  >  Prcn,  yon.  who,  wheewer  (4)  WtW  «  VERB 
pass,.  J  '■ 

6,  ,obj. 

W  (4)  indef,  >  yen,  it*  jAojnop,  no  ene^W  =  VBRB 
poss,  J 

7,  W  (4)  of^W»  HOOH 

8,  Prep  4  ^  (4)  V^W  -  ROOH 

9,  ^  4  W  p  a4;8^vi|R0IJH 

aeoording,  etp,  4  ^  4  W  fit  a4s  >4W  «  ROOH 

^  ■  aseribe,  eto*  4,,,4te  4  W  ft  4  s  ♦W  ■  ROOH 
A 

to  4  V  ft  a48^W  »  TBS 

10,  to  (4)  W  ■  ll^S  4W  e  ROOH 

””  '  J* 

n,  V  (4)  that  »  ROWR 

axR,  V,  (4)  A  «  "ed*  or  "ing*  (4)  that  4  W^eW  »  HOOi 
lYop,  4  that  4  W  »  a48^W  m  "WIB 
nwp,  4  thai  4  W  ft  a4o^V  •  ROQR 

12,  atoc,  ^ 

V(4)  nodal  i  V4W  «  ROOH 
ooptl,  j 

13,  V  (4)  *ed*  (4)  i  ROOH  or  iDJ,^W  «  ROOH 

14,  W  «  intnm,  (V)  (4)  VBIB’^W  *  ROOT 
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15.  V  -  iatnxu  (•(■)  V^V  »  MOON 


(4-)  y4W  «  SOON 


17.  to  (•>•}  V^W  •  HGOR 

18,  P*op 

y  j  +  thio.  oto.  (+)  ■  lom 

this,  oto.  (4-)  W  «  »f8  •  VBB 

tibooe.  oto,  .(4>)  V  >  ftft  ^  ¥■  HODH 
tH'i.  oto,  (4>)  V  4  ^  V  «  HOOI 

thiso.  oto,  (•••)  V  4  *■  ySffl 


19. 


a 


panetoatlon  ^  S  AOy/ADJ 

(^A  4  ort..  pooo.  A.. 


•lag,  ofSBi  I  *  V^W  «  ion 


ion 


ion 


20.  ¥  a*  040  4-  sing,  oemt  I  4-  ^ 

a.  ¥  4-  pltar,  I  (4-j^  ^  ^ 

22.  y  (ignoring  A's)  (4-)  I  -f  ¥  -^9  ¥  «  lOn 


23. 


y  e  tno  ebj, 
y  ^  too 
4  nsB 


.  I^nr.  I  (4-)  ¥ap¥  «  ion 
•  •  plnr,  I  (4>)  ¥^¥  »  VERB 

plnr.  I  (4')  ¥^¥  »  ySB 


2b.  #  (4-1  ¥  *  *42  =»  ¥  «  ion 

#  (4-)  ¥  4  »ts  (ml 

^  (4-)  ¥  f  *40  •  '^?®i^no  y  ooottra  •  ,  ,  #  :^  ¥  «  ySB 
(*nx, 

#  (4)  ¥  ^  *40  •  .  .(aodal  ]  y  ^  ¥  «  lOn 

"  (oopa,  J 

25.  althonih.  «t.  4  ¥^¥  «  |0n 

26.  ¥  (4)  A  4  Prop^¥  «  lOn 


27. 

28. 


flbj.  1 

lot  (4)  j  P*«n,  no  on*  (4)  ¥f¥ 

fy  too  obj, 

pnetn*ti«i  ...  oeonrol 


ym 

¥  (4)  (  k/?tm  1 

.  (jpOM.  ij 


•i^  ¥»  mB 
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.  /w™  I 

29.  4  art.  ^  sing,  oowt.  H  ^  ^  W  »  VHtB 

4  art.  H*  Bing,  aomt,  N  +  1  (+)  W  «  a+s  »  VBD 
art,  +  slug,  oottnt.  N,  +  HtoiBral  (+)  »  W  »  a42  *#»V  «  KOOH 
XtoBsral  (+)  Vf  *»  a+8  W  »  BOIBI 


30.  W  =  a  •b*lisve*(+)  tJwt  (+)  (T)  ^  r  =  VERB 

31.  W  (•)*)  NtoBBral^W  «  VERB 

32.  W  »  eapltallscd  4>  W  »  NOON 


33.  NTai»ral  (+)  W  +  plaral  (K)  ^  W  «=  NOON 

Ntnsrsl  i+)  W  (N)  -f  plural  (N)  «  NOOK 

^  sing,  ooxmt.  N  +  Nutisral  (■<■)  W  »  VBtB 

sing,  oeunt.  N  4  Nuneral  (.•¥)  W  a  NOOK 

34.  A  (+)  W  ^  a+8  W  a  NOON 

35.  tndef.  art,  +  N  (+)  W  »  i+e  W  «  VISB 


36. 

3?. 

3«. 

39. 

40. 

41. 

42. 


sizig.  N  +  W  a  a+8  +  aitig.  N  ■■^  'n  ~  VHtB 
#  +  W  W  a  NOON 


A  +  W  ^  W  a  NOON 

that  +  W  ?•  W  a  VI5RB 

W  (+)  that  +  prep.  W  «  VERB 

W  V  subd.  abaxiwivft*  ''+)  that  +  prsp.  W  »  NOOK 


punctuation^*  *  *  *  i 


(  A  poss.  A,  art.',  etc 


losing,  count,  nouna 


! 


+  W  »  NOOK 


Oonj,  (+)  W  =  sing,  ooun't,  (5)  +  prop. ’^W  «  VERB 

ooiij,  (+)  W  4  sing,  counts  (ft)  +  prop,  ^  Qo  aooovdin^  to  priority 

(No  priori^,  go  to  ipi  ciyoca^;^ 


43.  N  +,  +  W  +,  W  =  NOUS 


7  +,  +  W  +,  ^  W  +  VERB 

44.  ,  W  »  sljfjg.  eount.  (H)  +  prop.  W  »  VERB 

,  W  4  sing,  cotait.  (H)  prfep.^  ^Jc*  aoo<irding  to  prioirity 

(Ho  priority,  go  to  najrt.mls) 


(NOTBt  For  dofinitlon  of  "(P.B*)**,  *8*  «nd  "stdtjoei*  aoo  wlte  of  OOiOCRD.) 


I*  W  B  priority  ■  0  Qo  to  n.  Priority  «  1  W  «  VOQR,  Priority  >2  W  ■  VRB* 

n*  (p.B.) ..  •  •  s  B  prop  ^  w  B  booh 

(P.B.)  •  •  •  8  B  8«A>joet  ^  Qo  to  A. 

(P.B.)  •  •  •  no  8  found  ,  •  •  W  •$>  Qo  to  HI* 

A.  (P.B.)  -  V  ...  V  B  40  (4)  pltonl  (H)  W  «  VERB. 

(P.B.)  »  V  ,  .  .  W  »  +8  (+)  (V)  B  HOW. 

(P.B.)  bV.  .  .Wb48  ^getoAofJUD  OOiCGRD. 

B.  (P.B.)  •  .  •  8  B  pita>al  ...  (no  «nd»  mr.  nor)  .  .  •  W  «  plnWlO^W  b  boon 

(P.B.)  ...  8  a  plural  ...  (no  ina.  w.  nor)  .  •  .  V  ^  plur(l)«^W  b  vehb 

(P.B.)  ,  ,  t  S  f  plural  ...  (no  mm.  w.  nor)  •  .  .  V  b  plur(B)^  V  a  VEP3 

(P.B.)  .  .  •  8  f  plural  ...  (no  and.  0£.  nor)  .  .  .  V  ^  plur(N)9W  «  bchiN 


AND  CCMCGBD  to  go  «lth  BSMBU 


see  hkl 


#  •  •  •  (no  vex^)  •  •  •  ConJ.  (+)  V  W  «  MOOR 

#...(•(•)  V  ..  .  ConJ.  (••■)  W  a  »f8  ^  W  «  NOON 

#...(+)V,..  OonJ,  (+)  W  4  ^  W  ■  7BIB 

A. 

1.  #  ...  (no  subject)  .  .  .  V  .  .  .  C  (+)  W  W  ■  NOW 

2.  (no  prep,  ooonrs)  .  .  .  subject  .  .  .  V  ^  VaU  Coneexd* 

vlth  this  subject 

3.  Repeat 

*  as  in  CONCORD  IIB. 
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APPENDIX  XIV  (eont.) 


-ing  itoblguitles  Rules 

NOTE;  F(F)  means  "viord  foUovdng  (F)",  where  (F)  means  "word 

following  W"  (=  ambiguous  item),  ignoring  adverb/adjectives 
and  adverbs. 

P*  is  like  (P)  except  that  it  also  ignores  adjectives. 

1.  Is  W  capitalized  but  not  the  first  word  in  the  sentence?  If  yes,  take 
as  NOUN. 

2.  Is  (F)  an  article,  pronoun/adjective,  possessive  adjective  or  possessive 
noun?  If  yes,  take  W  as  VERB. 

3.  Is  (P)  an  article?  If  yes,  take  as  ADJ. 

4.  Is  P  =  very?  If  yes,  take  as  ADJ. 

5.  Is  (P)  =  adjective  (4  If  yes.  take  as  ADJ. 

6.  Is  (P)  an  auxiliary,  modal,  or  copulative?  If  yes,  take  as  ADJ. 

7.  Is  (F)  .ta;  present  tense  verb,  modal,  copulative  or  auxiliary?  If  yes, 
take  as  ADJ. 

8.  Is  (P)  =  during?  If  yes,  take  as  ADJ. 

9.  Is  F  =  method(s).  orocedureCs).  excerclse(sl.  or  teo)pi^que£s}.?  If  yes, 
take  as  ADJ. 

10.  Is  W  almost  always  a  verb  (i.e.  a  L,D.P. )?  If  yes,  take  as  VERB, 

■’,1.  Is  F  an  object  or  indefinite  pronoun,  more,  most.  £24,  it?  If  yes, 

take  as  VERB. 

12.  Is  (?)  a  plural  noun?  If  yes,  take  as  VERB, 

13.  Is  P  an  indefinite  pronoun,  who,  whoever,  no  one?  If  yes,  take  as  VERB, 

14.  Is  F  an  indefinite  pronoun,  no  one,  vba,,  whoever?  If  yes,  take  as  VERB. 

15.  Is  (P)  an  indefinite  or  object  pronoun,  it,  £2U?  If  yes,  is  F  a  candi¬ 
date  noun?  If  no,  take  as  VERB. 

16.  Is  F  =  itself?  If  yes,  if  FF  a  verb?  If  yes,  is  (P)  a  noun?  If  no, 
take  as  VE^ 

17.  Is  F  a  reflexive  pronoun?  If  y^s,  take  as  VERB, 

18.  Is  (W)  the  first  word  in  the  unit?  If  yes,  is  next  vert  candidate  part 
of  auxiliary,  modal,  or  copulative?  If  yes,  take  as  ADJ. 
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19.  Is  P  =  If  yes,  is  F  punctuation  or  If  yes  take  as  ADJ;  if  no  is 
F  an  adverb?  If  yes,  take  as  ADJ;  if  no,  see  if  Previous  Verb  is  part  of 
appear,  consider.  If  yes,  take  as  ADJ. 

20.  Is  (F)  a  possessive  adjective  or  whose?  If  yes,  take  as  VERB. 

21.  Is  (F)  =  of?  If  yes,  is  W  =  consisting,  dying,  speaking,  talking,  telling. 
thinking.  If  yes,  take  as  VERB;  ii  no,  take  as  ADJ. 

22.  Is  P  a  singular  countable  noun?  If  yes,  is  P*P  an  article?  If  no,  take  as 
NOUN. 

23.  Is  F  =  due  to?  If  yes,  take  as  NOUN. 

24.  Is  (F)  =  preposition  ^)?  If  yes,  take  as  VERB. 

25.  Is  P  =>  most,  too?  If  yes,  take  as  ADJ. 

26.  Is  F  a  singular  countable  noxin  candidate?  If  yes,  take  as  ADJ. 

27.  Is  P  a  verb?  If  yes,  take  as  ADJ. 

28.  Is  P  a  possessive  noun  or  possessive  adjective?  If  yes,  take  as  ADJ. 

29.  Is  W  intransitive  as  a  verb?  If  yes,  is  F  a  noun  or  adjective?  If  yes, 
take  as  adjective;  if  no,  take  as  VERB. 

30.  Is  (P)  =  of?  If  yes,  is  P(P)  =»  accident,  aim,  art,  capable,  certainty. 
custom,  device,  difficulty,  ease,  effect,  feasibility,  habit,  hope. 
hopelessness,  idea,  impossibility,  incapable,  interest,  means,  met^. 
necessity,  ob.leot.  obligation,  possibility,  practice,  problem,  purpose. 
question.'  result,  rule,  sake  or  their  plurals?  If  yes,  iake  as  VERB; 
if  no,  as  ADJ. 

31.  Is  (P)  =  with,  between?  If  yes,  take  as  ADJ, 

32.  Is  (P)  =  for?  If  yes,  is  P(P)  =  cure,  device,  facility,  fane,  flair. 

machine,  necessity,  need,  notor ie^yT* notorious .  reason,  talent,  or  their 
plurals?  if  yes,  take  as  V’EltB, 

33.  Is  (P)  =*  ^?  If  yes,  is  (P)(P)  part  of  begin,  oontiny.  end,  end  up.  finish. 
finish  UP.  open,  start,  start  out,  start  up.  stopt  If yes,  take  as  VERB, 

34.  Is  (P)  a  preposition?  If  yes,  is  F  a  punctuation  mark?  If  yes,  take  as 

ADJ.;  if  no,  is  F  an  adverb?  If  yes,  take  as  VERB;  if  no,  see  if  F  is  a 

pronoun,  adjective,  or  candidate  noun.  If  no,  take  as  ADJ;  if  yes,  see  if 

FF  is  a  preposition.  If  yes,  take  as  VERB;  if  no,  see  if  P(P)  is  a  verb. 

If  yes,  take  as  ADJ. 

35.  Is  F  a  candidate  noun?  If  yes,  is  (F)F  a  verb?  If  yes,  take  as  ADJ. 

36.  Is  (P)  =  let?  If  yes,  take  as  ADJ, 
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37.  Is  (P)  a  pronoun/adjective T  If  yes,  take  as  ADJ. 

38.  Is  (P)  a  conjunction  (l.e.  and.  2£»  22E*  iliSS)  »  punctuation 

mark?  If  yes,  is  F  a  noun?  If  yes,  is  PCP)  an  adjective?  If  yes, 

take  as  ADJ. 

39.  Is  (P)  a  punctuation  mark  or  a  conjunction?  If  yes,  is  P(P)  an  -ln£ 
word?  If  yes,  take  sane  as  P(P). 

40.  Is  F  a  punctuation  mark?  If  yes,  is  Previous  Verb  part  of  begin. 
continue .  help,  keep,  lie,  send,  sit,  stand,  start,  stop,  trj^,  wgr^? 
If  yes,  take  as  VERB;  if  no,  take  as  ADJ. 

41.  Take  W  as  VERB. 


NOTE:  1)  No  search  crosses  a  punctuation  mark. 

2)  -ing*s  resolved  as  VERB  are  also  set  as  ABSOUJTE  BREAKERS. 


-105- 


-ing  Ambiguities  Pictures 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

10. 

11. 

12. 

13. 

14. 

15. 


W  =  capitalized  ^  first  word  in  sentence  ^  \ 

Poss.  N.  J 
Art.  (+)  W  W  =  ADJ. 

Ver!;>’-  +  W  ^  W  =  ADJ. 

A  ^  ing  (+)  W  W  =  ADJ. 


Aux. 

Modal  > 
Copula.J 


W  (+) 


(+)  W  W  -  ADJ. 


Aux. 

Modal 
Copula. 

Present  Tense  V.^ 


=  ADJ. 


During  (+)  W  ^  W  =  ADJ. 

W  +  Method(s).  etc,  ^  W  =>  iCJ. 

W  =  "funny  word"  Vf  =  VERB 

^  +  [Sdef^pi*./*’  — ’  saat.  X221.  it] 

Plur,  N  (+)  W  W  =  VERB 


indef.  Pr, 

whoever. 
no  one 


+  W  W  =  VERB 


W  + 


indef,  Pr. 

W  =  VERB 

whoever.  ' 
no  one 


indef,  Pr, 
Obj. 
it,  22a 


(+)  W  +  ft  (N)  W  =  VERB 


=  NOUN 


W  =  VERB 
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16.  ^  (N)  (+)  W  +  Itself  +  W  =  NOUN 

17.  W  +  Refl.  Pron.  W  =  VERB 


W  =  ADJ. 


18. 

W  s»  1st  vrord  in  unit 


(V) 


(aux. 
nodal 
copula. 


19. 


43  +  W  +  [  Punctuatlony  ^ 


as  +  W+  D^  W  =  ADJ. 


V  = 


"appear. 

consider" 


+  as  +  W  + 


adverb,  m  \ 
punctuation  j 


W  s  ADJ. 


21.  W  =  consisting,  etc.  (+)  of  W  =  VERB 

W  ^  consisting,  etc.  (+)  of  ^  W  =  ADJ. 

22.  ^  Art.  +*  Sing.  Cotint.  N  +  W  ^  W  »  NOUN 


23.  W  +  due  to  =#’  W  NOUN 

2^.  W  (+)  Prep  to  W  =  VERB 


J  +  w  W  =  ADJ. 

26.  W  +  Sing.  Count.  (N)  ^  W  =  ADJ. 


27.  V  +  W  W  =  ADJ. 

28.  poss.  N 1  +  w  ^  W  =  ADJ. 

poss.  A  J 

29.  W  =  intr.  (V)  +  ^  Aj  >  W  =  ADJ. 

W  =  intr.  (V)  +  ^  [Xj  7  W  =  VERB 

30.  accident,  etc.  +  of  (+)  W  W  =  VERB 

^  accident,  etc.  +  ££  (+)  Vf  ^  W  =  ADJ. 

31.  with,  between  (+)  W  W  =  ADJ. 

32.  cure,  etc.  +  for  (+)  W  ^  W  =  VERB 

33.  begin,  etc.  (+)  ^  (+)  W  W  =  VERB 
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34’,  Prep,  (+)  W  +  Punot,  =4'  W  =  IDJ. 
Prep.  (+)  W  +  D  ■=?>  W  =  VERB 


Prep.  (+)  W  + 


W  a  IDJ. 


35. 

36. 

37. 

38. 

39. 

40. 


Prep,  (+)  W  + 


+  Prep  •=^  W  »  VERB 


Vert)  (+)  Prep.  (+)  W  (Pron. ,  A.  n)  W  =  IDJ. 

W  +  (N)  (+)  V  =»  W  a  IDJ. 

Let  (+)  W  W  a  IDJ, 

Pron/A  (+)  W  W  a  |DJ. 

A  +  Conj,  or  Punot.  (+)  W  +  S  W  »  |DJ, 
a  ing  (+)  oonj.  or  punot,  (+)  W  W  =  rtrevlous  Ing 

V  a  hagln  eto.  +  .  .  .  +  w  +  Punot.  ^  W  a  VERB 

V  ^  begin  etc.  i-  .  .  .  +  W  +  Punot.  W  a  IDJ, 


JLPPMUZ  ipnr  (eeni'd,) 
Pket  tm  Ad3«gtl^ 


lOnCt  (P)  «  prwTiotts  nord,  Icnorinc  adfvrba  and  advasb/adjaatlTta 
F  •  feaioving  vord 

lie. 

1*  Za  W  an  atodllavgr,  aadalt  er  MpvOat&vat  If  jaa,  la  (P)  an  .a«dllaigr« 
nadal,  av  eaptOatifat .  If  yaa,  taka  W  aa  ADJ.t  If  no  or  if  (P)  la  pona- 
inatloB,  la  V  atocHlary  or  nedalT  If  jos,  taka  aa 

2»  Is  (P)  an  artldoT  If  ys8«  taka  as  IDJ* 

3«  Is  (F)  an  arblolat  If  70s,  take  as  7BB« 

Is  (P)  a  possasslvs  adjsetivo  or  possSsslvo  notn,  or  aa  ad^aatiira 
If  JOB,  take  as  iDJ. 

5*  Is  (F)  a  possesstve  nexoi,  possessire  pranamt  passesslra  adjaativa,  or 
ahaaa?  If  'jes,  take  as  TBRB» 

6,  Is  (P)  e  raryt  If  yes,  take  aa  ADJ* 

7*  Is  (P)  an  atcdliary,  oopiilatiTe,  or  nadalT  If  yes,  taka  aa  A8J« 

8«  Is  (P)  an  indefuilte  or  snbjeot  prcmonn,  shoarer.  ifcto.  aa  op.  ysnV 
If  708,  take  os  FEEB., 

9*  Is  (P)  a  preposittoa  4  SSl  ••  AW* 

4  t 

10*  Is  (F)  an  Indefinite  or  object  prcnotin,  no  ooe.yoo.  If  yes,  taka 
as  Tiro. 

U,  Is(F)  a  reflexive  immonnl  If  708,  take  as  TUB* 

12, .  Is  (F)  a  prepooitloaT  If  yes,  ta’o*  as  TBS. 

13.  Is  (W)  tho  first  Mord  In  the  unit?  If  yes.  Is  F  an  adjeotiva*  nam,  sdbjaai 
er  indefinite  pronoun,  or  yea?  If  yes,  take  as  ADJ*. 

li-  F  a  oandidste  niotm?  If  yos.  Is  (F)V  a  verb?  It  yes,  tsks  as  JUEU* 

13.  Is  F  a  singular  oountahla  no^mT  If  yes,  is  IF  a  natn  oandijdaie*  If  no, 
of  If  IF  is  punotTutlon,  taka  as  ADJ* 

16.  Is  (?)  a  noun,  object  pronoun,  ypx^  cr  It?  If  yea,  is  (?)  (?)  a  rerb  sidola 
05*  08,  or  17?  If  yas,  taka  as  jESj.t  iT* no,  sea  if  (?)(?)  in  a  tao  object 
vest)}  if  yes,  is  F  a  candidate  noun?  If  yos,  taka  aa  DM* 

17*  la  W  oapltalised  bu^;  not  tha  first  ia>rd  in  tha  aaatanoat  If  yea,  taka 
V  aa  HOOI. 


17.  Is  (P)  a  noviuT  If  yes,  take  as  VERB* 

18«  Is  (P)  s  asT  If  yes.  Is  F  >  asT  If  yes,  take  as  iDJ* 

19*  Is  (P)  »  altoetiriit.  beoaose,  since.  thoTMfli.  itot  hen,,  hwwrpr.  onss.' 
so,  thus,  shat.  ehatererTwerseer.  thatt  ggl  If  yes,  is  F  pmatuation? 
Tf  yes,  talcs  as  lW5r*” 

20*  Is  (P)  a  TsxbT  If  yes.  Is  F  a  eandidate  noon  or  an  adjsetiTet  If  yes 
take  as  ASJ, 

ZL.  Is  (P)  a  pronotn/adjeotlreT  If  yes,  is  (P)(P)  a  Tsxb,  propositlen,  or 
possessive  adjective?  If  yes,  take  as  ttiJ, 

22.  Is  (P)  s  althptagh.  bemuse,  slnee.  e^n.  ttomdi.  hoe,  if,  entll.  ehors. 
eheraas.  efaether.  ehila.  vhilst.  ehatT  If  yes.  isT  a  caadidato  notuT 

as  ADil. 

23,  Is  (F)  an  fti^ish  ntnaeral,  Arabic  nimeral,  or  other  ayrikwlT  If  yes, 
take  as  VERB. 

2k,  Is  (P)  B>  that  or  idiich?  If  yes,  is  F  an  adjeetive/proaoen  or  advezb 
or  ptnotuation?  If  yes,  take  as  VSBt  if  no,  take  as.ADJ. 

25*  Take  V  as  VERB. 
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Bart  Tanaa  idjartlra  Plotw 


1«  axDdlinry  ~) 
Mdal  ( 
eoptCLatlre  J 

( auxUlaiy  'j 
2  {aodal  > 

I  eoDtilatlTa  j 
V^ponc,  y 

Z,  artlola  (•(•)  V  V 
3,  W  (+)  artlola  ^  W 


/atodltazj^ 
(+)  W  «  /  Mdal  \ 

eoptdativaj 

ww. 


If  ■  HU* 

vrnrm 


4,  poaaaaaiva  noting 

poaaaaalTa  adj*  r  (<(■)  W  *  ADJ. 
adjaetlTe  lay)) 


5. 


W  (+) 


'pasa*  aoan 
peas.  adj4 
poaa.  pro* 
idwaa 


V-  vns. 


6*  varr  (+)  W  W  ■  AW* 


7* 

8* 


atDc*"\ 

Md*  >  (+)W^W«AW* 

oop*J 


iadaf*  or  aabj*  Pr*^ 
rto.  rtgarar,  ^  j 
joa.  no  ona  J 


(•••)  V  W  «  VBtB 


9*  P  ^  aa  (+)  W  W  *»  ADJ* 


10* 


W  (+) 


f  ladaf*  or  obj*  Pr*"] 

\  no  OP*  J 


^  V-  WB 


U* 

12* 

13. 


W  (+)  raUaxlTO  Pr*  W  ■  YIBB* 
W  (•^)  prapoaitloii  V  a  VBtS* 

!  J  « "  ♦  ( 


14*  ff  +  (I)  +  V  W  -  ADJ* 

15*  W  +  alBg*  oomt*  i*  +  (^  (I)  )  »#  W  *  ADJ* 


.011. 


16.  Vs/subblass  *1  /Cbjaet  "I  MvAVmUa, 

(05.  08.  17 J  U.  20tt.lt/  MW. 

ttto  object  V  (+)  ^  <♦>  W  +  (I)  W  -  ADJ. 

17.  H  (+)  W  ^  W  »  VIHB 


18.  «£  (+)  W  +  as  W  »  IDJ* 

19.  aW 


(+}  V  -4-  pttnetttatlen  ^  W 


20,  V.  (+)  W IDJ. 


21.  (I  P. 


pess 


•x.] 


(+)  Pp/A  (+)  W  B  ADj. 


22,  altheqgh.  because,  sinoe,  ^en  thoggO 

ho^ll,  wntll.  ttfaers.  idierMis,  ?  (+)  W  +  (M)^  W  *  ADJ. 

ttbsther,  tthiie.  ehliet.^^i  J 


23. 

C  Aigllsh  nuneral  1 

w  (+) 

^  Arable  no. 
(  syiribel 

j  w. 

.  WRB. 

2t.  that  1 

(+)  W  +  i 

,'aVa  ( 

[^punctuation  J 

^  V«  7IRB 

1 

thatl 

(+)  W  +  ^ 

f  Pr/A 
{  D 

[  ^  W  *  ADJ, 

^punctuation  _ 

1 
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APPBDIX  QV  (O0nt*d.) 


Rfent  Teaaw  V»ib/4d.1«fltly' 

HCSfKx  F(F)  «  nord  fdlloidBg  (F)  _ _ _  _ _ 

(F)  »  uezd  fdUmtng  V,  Ignoring  adrozbt  and  tdrazb/adjootiraa 
F*  -  aaM  aa  (F).  btct  alao  ignorliv  Idjaotlraa  (artielaa, 
poaaaaalTos) 

!•  Za  W  oapltallaod  bvt  not  the  first  nerd  In  the  sanUnoaT  If  y,  taka  ¥ 
as  non. 

2*  la  W  a  +  a  (l«o,  doss  it  hara  an  a.  affix)?  If  yos,  taka  as  VUk# 

3^  Is  (P)  an  artlola,  possasslTe  notm,  or  possasslva  adjaotiraT  If  yaa, 
taka  as  ADJ. 


5. 

6. 

7. 

8. 

9. 

10. 

n. 

12. 

13. 

14. 

15. 


Is  p  a  ooMparatlraly.  too.  strlklniflLy.  tnrJ  If  yas,  taka  as  ADJ. 


la  (P)  - 

afaarsaa. 


|SSSSS£* 

sMla. 


*  1^* 
.t  yas,  tSca  as  AZw^ 


Is  (P)  a  snbjaot  pronotmT  If  yas,  taka  as  VBB. 


Is  (F)  a  possasaiTs  or  raflaxlTa  or  objaot  pranoan,  2SB^  ^  F*** 
taka  as  FBtB. 

Is  (P)  a  anothar.  May.aaoh.  moh,  no  ona.  ma,  othar.  snAt 
If.  yas,  taka  as  UffT" 

Is  (P)  a  nodal  ▼oxb?  If  yas,  take  as  FSBB. 

Is  (P)  an  aTadllaiy  or  ooptOLative  raAt  If  yas,  taka  as  ADJ. 


Is  P  a  not?  If  yas,  taka  as  ADJ. 

Is  (P)  a  proposition  (»/  lo)T  If  yas,  taka  as  ADJ. 

Is  (P)  a  tsA  snbdlass  06,  25,  27?  If  yas,  taka  as  ADJ?  If  w>,  ija  U 
(F)  la  an  article,  possasslTe  adjacrttfa,  prenonn/adjaotlTa,  Indaflnlta 
pronom,  jjgajar,  no  one.  If  yas.  Is  (P)  a  two  object  tsAT  If  no,  taka 
as  7BD. 

Is  F  a  ofT  If  yas,  take  as  ADJ. 

Id  P  a  prononn/adJeotiTe,  nore,  nostt  If  yas,  Is  (P)P  a  praposltlan, 
tsA,  or  artielaT  If  yas,  taka  as  ADJ. 


16.  Is  F  an  adrsAT  If  yas,  taka  as  FSB. 

17.  Is  (P)  an  IntransitlTS  tsAI  If  yas,  taka  as  ADJ. 

18.  Is  (P)  an  adJaotlTar  If  yas,  taka  as  ADJ. 
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19*  Is  (P)  an  object  os*  tndefl.nitd  pronoun*  no  one,  ohoovor.  It.  renT 
If  708*  is  (P)(P)  a  ]^T  If  708,  take  as  YIKB. 

20*  Is  (P)  an  indefinite  pronoun,  tdto,  yent  If  tos,  is  (P)(P)  a  prspeei- 
tion  or  a  vei4>T  If  no,  take  as  TSeST' 

23.,  Is  (P)  a  plnral  ne«n.  If  tos,  is  thors  a  preposition  or  a  rsxb  proasding 
(P)T  If  no,  take  as  VIRBt  if  jes,  is  the  Preoeding  Veiis  intrensitiTst 
It  ^  ttk.  ..  TUB. 

22.  Is  F  a  Terbt  If  yes,  take  as  NOUN. 

23.  Is  V  the  first  word  in  the  unit?  If  tss,  is  F  an  Arable  ilossralT  If 
708,  take  as  VERB. 

Is  W  the  first  word  in  the  unit?  If  708,  is  there  a  FoUovlng  Vest) 
eandidate,  F«  (V.)  in  the  'mit?  If  no,  take  W  as  VBffi;  if  708,  is  F,  (E) 
a  nsdal,  eoptOLatire  or  ax^dllaryt  If  708,  take  aa  ADJ. 

25*  If  F  a  oandidate  noun?  If  yes,  is  (F)F  a  verb?  If  yes,  take  as  ADJ. 

26.  Is  (P)  a  punetuation  nark?  If  yoa,  is  F  a  noun?  If  70s,  is  the  fSUewing 
Verb  a  nodal,  oopulatire  or  auxiliaxy?  If.  yes,  take  as  ADJ. 

27.  Is  P  a  Jto?  If  708,  is  PP  a  attry)utable.  attribute(s)(^).  belong. 
eggosed,  opposite,  proportien^^^.  siSBlajr.  si^biecitedli  If  yes,  take 
as  ADJ. 

28.  Is  (P)  a  ^?  If  yes,  is  there  a  Preoeding  Vest)  in  tht  unit?  If  no, 
take  as  vSb. 

29.  Is  F*  a  sing,  oount.  noun  eandidate?  If  yos,  is  FF*  a  noun  oandidate? 

If  no,  take  as  ADJ. 

30.  Is  (P)  a  oonjunotion  (i.e.  and,  ot,  nor,  but. than)  or  punotuation?  '  If 
yos,  is  P(P)  an  adjeetire?  *1!?  yos,  ^CSFe  as  Aye, 

31.  Is  P  B  now,  lyat?  If  yes,  is  P?  a  oonjunotion?  If  yes,  is  PPP  an 
adJeotiveT”  if  "70s,  take  W  as  ADJ. 

32.  Is  (V)  the  flr^  word  in  the  unit?  If  yos,  take  as  ADJ. 

33.  Is  (P)  punotuation?  If  yes,  is  (F)  punotuation?  Xf  yea,  is  (P)(P)  a 
verb?  If  yes,  take  as  VIRBt  if  take  as  ADJ. 

3k.  Take  V  as  VERB. 


m  Marehaf  start  with  W. 


3. 

4. 

5. 

6. 

7. 


artlele 

poss.  Moun  )  (+)  W  ^  W  ■  Ad  jeetlve 

poss.  Adj*  j 

eogMmUvs32*  )  -f  w  ^  W  «  ADJ. 
strUclnay.  very  J 

(although.  ....  hoH*  ♦  •  •  ♦  (+)  W  W  »  ATJ. 

subject  Pron.  (+)  W  -7  W  ■  VBRB 


W  (+) 


^poss.  Pro. 
rsilex.  Pro. 
Objeet  Pro. 
(it.  2SS 


W  >  VSRB 


3. 


9. 

10. 

11. 

12. 

1^5. 


(another.  •  •  .  «  SJsis)  '^*’5 
modal  T  (+)  -  V«B 

auxlllaipr  1  (+)  ws  W  -  ADJ. 

oepulatirej 

not  +  W’^W  •  ADJ. 

Prep  (^  to)  W  ^  W  «  IDJ. 

(V  subclass  06,  25,  27)  (+)  W  ♦  W  »  AW. 

r artld.s 
j  pass.  AW. 

4  tso  object  V  (+)  W  (♦)  S  Pr/adj.  >^vr  »  VERB. 

/  indsf.  Pr. 


14. 

15. 


W  *  ^  W  ■  AW. 

(+) 


P 
V 

art. 


[  }&SESI*  sajsi^ 


\  mn  I  {♦)  W  -  ADJ. 


l6.  W  +  adrerb  ^  W  ■  VIRB. 


17.  Intrans.  V  (+)  W^W  ■  AW. 


+  W  =>  W  =  VERB 


19.  let  (+) 


/obj,  or  indef.  Pr,  1 
/whoever,  no  one,  it.  von  ) 


20. 


(+) 


/  indef,  Pr,~) 
I  who,  20U  j 


(+)  W  W  »  VERB 


21.  . 

J  ;■+...  +  plural  N  (+)  W  ^  W  =  VERB 
intrans,  V  +  ,  ,  ,  +  plural  N  (+)  W  W  =  VERB 


22. 

23. 

24. 


W  +  V  W  =  NOUN 
(W  =  1st  word  in  unit^ 

(W  =  1st  word  in  unit)  + 
(W  =  1st  word  in  unit)  + 


+  Arabic  numeral  W 


VERB 


W  =  VERB, 
4>  W  =  ADJ. 


25.  W  +  (N)  (+)  V  =5^  W  =  ADJ. 

/  modal  I 

26.  Punct,  (+)W+N  +  ...  +  T  cop,  r  ^  W  a  ADJ. 

[  aux,  j 

27.  (attribute.  ...  ,  sub.ieoted)  +  ia  +  W  W  =  ADJ* 

28.  no  V  +  ...  +  to  (+)  W  W  =  VERB. 

29.  W  (+)*  sing,  count,  (N)  +it(N)  W  =  ADJ, 

30.  /"g  1 

iPunct.  I  WW'^’VI.ADJ. 

I  +W^W>»W. 

32.  V  (+)  Punct.  (+)  W  (+)  Punct,  ^  W  »  VERB 

(+)  Punct,  (+)  W  (+)  Punct.  ^  W  =  ADJ. 

33.  W  =  VERB 
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APPENDIX  XIV  (oon't.) 


Not«j  (P)  ■  Following  word  (Ipwrlng  adworbs  +  adverb/tdjootlvos). 

1,  Is  W  capitalised  and  not  the  first  word  in  tha  santanooT  If  yas,  taka 
W  as  NOUN. 

2,  Is  (P)  an  artlelo,  ^ssasslva  noun,  or  possasslva  adjaetlvat  If  yas, 
take  as  NOUN. 

3,  Is  (P)  indaflnlta  or  objaet  proi^un,  no  one,  ^t,  jguT  If  yas.  Is  (P)(P) 
»  Igt.  If  yas,  taka  as  VERB. 

I 

k.  Is  (F)  an  object  pronoun?  If  yas,  taka  as  yHtB, 

5,  Is  (P)  a  subject  or  Indaflnlta  pronoun,  w^,  wfaoawar.  ^t ,  JSS?  If  y»»* 
take  as  VERB. 

6,  Is  (P)  a  nodal  verb?  If  yes.  Is  W  Sffs  (l.a,  does  W  have  an  s  affix, )T 
If  no,  take  as  VBRBj  If  yes,  taka  as  NOUN. 

7,  Is  (P)  an  auxiliary  verb?  If  yas,  take  as  ADJECTIVE. 


8.  Is  (P)  a  verb?  If  yes,  taka  as  NOUN. 

9.  Is  (P)  a  preposition  (  4  to)t  If  yas,  taka  as  NOON. 

10.  Is  P  an  adjective  (  /  Ifig)  (l.a.  an  adjective  not  resolved  by  routine)? 
If  yes,  taka  as  NOUN. 

11.  Is  (F)  a  verb?  If  yas,  taka  as  NOON. 


Is  W  the  first  word  In  tha  unit?  If  yas.  Is  W  a  +  s?  If  F*® »  J®^* 

NOUNi  If  no,  see  If  there  Is  a  Following  Verb  oandldata,  F,  (V,),  If  iM, 
taka  as  VERB;  If  yas,  see  if  F.(V.)  Is  an  auxiliary,  nodal,  or  oopulatlva. 
If  yes,  take  as  NOUN. 


13. 

14. 

15. 

16. 
17. 
IS. 


Is  W  a+8?  If  yes.  Is  (P)  »  ^g?  tX  yas,  taka  as  NOUN. 

Is  W  a  theudit.  sopka?  If  no,  la  (F)  ■  gf?  If  yas,  taka  as  NOUN. 


Is  W  a+8?  If  yas.  Is  P  «  yothar.  a|^,  8B&* 

taka  as  VERB;  If  no,  sea  If  P  Is  a  proneun/adjactlva, 

taka  as  NOUN. 


If  yas. 
If  ya®. 


Is  P  a  pronoun/ad jactlva,  that?  If  yas.  Is  (P)P  a  preposition  or  a  verb? 
If  yas,  take  as  NOUN. 


la  (F)  a,  peAsaSstw  *0000,  possasslva  pronoun,  possasslva  adjective,  or 
whose?  if  yas,  taka  as  VERB. 


Is  F  a  singular  oountabla  noun?  If  yss,  taka  as  NOUN, 
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19.  Is  (F)  =  adj.  If  y»8,  Is  W  =  felt?  If  y»8,  as  V.  If  no,  is  F(F)-f  li^- 
ositionT  If  yss,  take  as  NOUN. 

20.  Is  P  a  singular  countable  noun?  If  yos,  is  (P)P  punctuation,  conjunetion 
(i.e.and.  or,  nor,  than,  but),  prepostion,  verb.  If  yes,  take  as  N^; 
if  noTi^s  t^/)  sn  ing  or  past  partieipal  adjeetive?  If  yes,  is  (P)(P)P 
ar.  aincHiaryT  If  yes,  take  as  NOUN. 

ZL4  Is  W  a+s7  If  yes,  is  (P)  a  noun,  indefinite  or  object  pronoun,  jou, 
l^T  If  yes,  is  (P)(P)  a  two«object  Torb?  If  yes,  take  as  NOUN. 

22.  Is  (P)  a  plural  noun?  If  yes,  'take  as  VERB, 

23.  Is  W  a+s7  If  yes,  is  P  an  Arable  or  Eh^lsh  nuii»sral7  If  yes,  take  as  NOUN, 

24.  If  F  an  article,  pronoun/ad jeetiTe,  indefinite  pronoun,  refLexlre  pronoun, 
more,  most.  Yery7  If  yes,  take  as  VERB, 

25.  Is  F  a  past  tense  Torb  candidate7  If  yes,  is  FF  a  noun  or  an  a^^ joctivo7 

If  no,  take  as  NOUN. 

26.  Is  P  an  -ing  word7  If  yes,  take  W  as  NOUN, 

27.  Is  P  punetuatlon7  If  yes,  is  F  punetuatlon7  If  yes,  is  PIP  a  verb  or 
adjective7  If  no,  take  as  NOUNi  if  yes,  take  W  as  PP. 

28.  Take  W  as  VERB. 
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1,  W  =  cap  (  ^  1st  word  in  s«nt, )  W  »  NOUN 


article 

possessive  noun 
possessive  adjective 


(+)  W  W  •  MODN 


Indefinite  pronoun 
object  pronoun 


Let  (+)  <  no  one 


4.  W  (+)  object  pronoun  =>  W  «  VERB 


(+)  W  ^  W  »  VERB 


5.  subject  pronoun 

Indefinite  pronoun 
who 

t^ever 

22U  ^ 


(+)  W  «  VERB 


6.  modal  verb  (+)  W  with  s  affix  W  a  NOUN 

modal  verb  (+)  W  with  no  s  affix  =?  W  »  VERB 

7.  auxiliary  (+)  W  W  ID3, 

8.  verb  (+)  W  =>  W  »  NOUN 

9.  preposition  te)  (+)  W  W  »  NOUN 

10,  adjective  with  no  -Ing  affix  +  W  W  »  NOUN 
U,  W  (+)  verb  W  =  NOUN 

12.  W  s  1st  word  in  unit  and  also  with  s  affix  V  •  NOUN 
W  =  1st  word  in  unit  +  no  FV  •=^  W  •"VBBB 

f  auxiliary ') 
nodal  j  ^  W  ■  NOUN 
oopulativeJ 

13.  to  (+)  W  with  s  affix  W  »  NOUN 

^  /snok?^  of  -i>  W  «  NOUN 


15.  another 
each 
no  one 


+  W  with  s  affix  ^  W  »  VERB 


this  J 
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15.  (Con«t.) 

pronoun/adjectlve  +  W  with  a  affix  W  ■  NOON 

16,  preposition  j  (+)  /  pronoun/adjeotlve^  +  W  W 

verb  J  (that  j 

^  V  =  NOON 


NOON 


17. 


W(+) 


I  possessive  noun 
'possessive  pronoun 
possessive  adjective 
^se  J 


18.  W  3  singular  oountahke  noun  ^  W  «  NOON 

19.  W  (|4  felt)  (+)  adjective  +  preposition  ^  W  »  NOON 


(!) 


singular 
countable 
noun  > 


+  W  ^  W  3  NOON 


20.  punctuation 
ooH^otion 
(i.e.  a^,  w, 
nor,  thin,  but 
proposition  / 
verb  J 

f  singular  I 

%  auxiliaiy  (+)  -ing  or  past  participial  A  (+)  1  countable/  +  W  W  «  hoon 
^  \  noun  ^ 


21. 


/  noun 

j  indefinite  pronoun  I 
two  object  verb  (+)  /  object  pronoun 


(+)  W  with  8  afflx^  W  3 


Vs.- 

22.  plural  noun  (+)  W  W  »  VERB 

+  W  with  8  affix  W  »  NOON 


W  + 


24.  ( article 
pronoun/ adjective 
indefinite  pronoun 
reflexive  pronoun 

VXSEZ 

25.  W  -f  past  tense  verb  candidate  ■i' 

26.  affix  +  W  ^  W  a  noon 


W  3  VERB 


noun 


adjective 


^  W  -  NOON 


27,  verb  +  punctuation  +  W  +  punctuation  W  »  VERB 
adjective  +  punctuation  -f  W  -t-  punctuation  ^  W  ■  ADJBOIIVB 
punctuation  +  W  +  punctuation  W  »  NOON 

28.  w  a  Verb 


NOON 
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APPENDII  m  (contd. ) 


Idloaynoratio  Distribation  Rules 

Notes  Concerning  Symbolism t 

1.  W  a  word  under  consideration 
P  a  Nord  preceding  W. 

F  a  word  following  W 

(P)  a  ward  preceding  W,  Ignoring  adverbs,  and  adverb/sdjectlves 
(F;*  a  word  following  W,  Ignoring  adverbs,  adverb/adjectlves,  and 
adjectives  (other  than  articles) 

PP  a  word  preceding  P 
P(P)  a  word  preceding  (P) 

2.  These  rules  also  set  the  proper  form  class  code. 

3.  VIhen  the  location  of  the  next  instruction  is  not  exid.lcitly  stated,  the 
absence  of  this  statement  stands  for  the  conmandt  Oo  to  the  first  step 
of  the  next  rule. 


RULES 

0.  The  following  words  and  only  the  following  words  enter  the  odd  ball 
routine  for  resolution:  august;  even;  still,  well;  down,  like,  except, 
near,  till;  back;  can,  may,  mli^t,  will;  mine, 

1,  Is  W  capitalized  and  not  the  first  word  in  the  sentencet  If  yes,  take  W 
as  NOUN. 

2.  Is  W  a  august T  If  yes,  is  W  capitalised?  If  yes,  take  W  as  a  NOUN;  if 
no,  take  W  as  an  liDJECTIVE, 

3.  Is  (P)*  an  article?  If  yes,  is  (F)  a  preposition  or  a  present  tense 
verb  ji?  If  yes,  take  VT  as  a  NOUN, 

4,  Is  W  a  even?  If  yes,  take  W  as  an  JPTSBB, 

5.  Is  (F)  a  a  past  tense  verb  candidate?  If  yes,  is  W  >  down,  nytf.  st^iUL' 
or  w^?  If  no,  is  (P)*  an  article?  If  yes,  take  W  as  a  NOUN. 

6.  1)  Is  W  a  like  or  MEcept?  2)  If  yes,  is  P  an  artiele  or  a  possessive  noun 

or  possessive  adjective?  If  yes,  take  W  as  a  NOUN,  3)  If  no,  is  P  ■  ^? 
If  no,  go  to  question  5  of  this  rule,  4)  If  yes,  is  PP  »  vert  list  20)? 

If  yes,  take  W  as  a  VERB.  5)  If  no,  is  (P)«a  modal,  Xt  3SL»  22&> 

they?  If  yes,  take  W  as  a  VERB. 

7,  Is  W  a  at^n  or  well?  If  yes,  take  W  as  an  ADVERB. 

8,  Is  P  a  of?  If  yes,  take  W  as  a  NOUN. 
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9.  Is  W  =  down,  like,  except,  near  or  tillT  If  yes,  take  W  as  a  PREPOSITION. 

10,  Is  W  3  back?  If  yes,  take  W  as  an  ADVERB, 

11,  Is  W  =  mine?  If  no,  take  W  as  a  VERB. 

12,  Is  P  a  preposition  If  y®®*  "take  W  as  a  possessive  PRONOUN.  If 

no,  is  P  =  tot  If  yea,  is  PP  a  member  of  list  20  or  21J  If  yes,  take 
W  as  a  possessive  PRONOUN. 

13,  Is  (P)  a  conjunction?  If  yes,  is  P(P)  an  adjective  or  possessive  pronoun? 
If  yes,  take  W  as  a  possessive  PRONOUN. 

14,  Set  the  NOUN/ -IRB  form  class  codes. 
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APPEiDIX  XV  —  tosolute  Breakers 


I. 


because 

how 

if 

what 

when 

whenever 

where 

whereas 

wherever 

whether 

which 

while 

who 

whom 

whose 

why 

although 

whichever 

whoever 

whomever 

whosoever 

albeit 

unless 

though 

whereby 

insofar 

whereat 

wherein 

whereafter 

wherefore 

lest' 

whilst 

V-ing 


Whensoever 

whence 

whencasoever 

wherefrom 

whereinto 

whereof 

whereon 

wherethrou^ 

whereto 

whereunto 

whereupon 

wherewith 


123- 


II, 


at  the  same  time  as 

at  the  same  time  that 

as  if 

as  though 

by  the  time 

else  that 

even  if 

even  though 

for  fear  that 

for  then 

however  many 

however  much 

if  and  only  if 

inasmuch  as 

in  order  not  to 

in  order  that 

in  order  to 

in  so  far  as 

in  the  hope  that 

just  as  if 

just  as  though 

just  because 

not  even  if 

not  even  though 

now  that 

or  else 

so  as 

so  that 

such  that 

the  way  in  which 

the  way  that 
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Ill  -  z 


(1) 

('provided)  +  that 
«  «  n  f"""" 

n  It  R  .  N 

HR  R  R 

R  R  R  H 

HR  H  R 

R  R  R  R 

R  R  R  H 

R  R  R  .  R 

HR  R  R 

HR  R  fl 

R  R  R  R 


(4>)  article 

"  subject  pronoun 

"  indofiid.te  pronoun 

"  possessive  pronoun 

"  possessive  adjective 

"  plural  noun 

"  proper  name 

"  verb 

"  i^st 

"  ihat 

"  adjective  /+/  plural  noun 


'Conjunction  (and,  taut,  qr.  nor,  jjy^i^an)  (+)  verb 
Preposition  to)  (+) •  verb 


'to  +  present  tense  verb  v4th  no  s  affix 
to  (+)  'verb  (not  a  singular  present  tense  verb) 
irregular  verbial  past  participle  (driven,  lain,  etc.) 
•after  (+)  past  tense  verb  or  past  participle 

•until  "  "  R  R  R  R  R 

•since  "  "  R  R  R  R  R 

•before  "  "  r  r  r  r  r 


verb  (+1  •  conjunction 
•as  +  ^ 

■  subject  pronoun 
"  "  "  verb 


'so  +  article 
"  "  noun 
"  "  "  pronoun 

R  H  R  y0j>l3 

•until  +  it,  you 

•since  +  it,  you 

•however  +  adverb/adjective 

punctuation  +  'but  SpR|poun 

R  R  R  +  proper  name 

pronoun 

R 
R 

fl 
R 


•after  +  subject 
•until  "  " 

•since  "  " 

•biefore*  " 

•besides*  " 


-125- 


object  pronoun  ♦  *  subject  pronoun 

indefinite  pronoun  •  s  •  • 

preposition  s  s  *  • 

adjeotive/pronoun  ■  •  «  ■ 

noun  "  * 

verb  4  (modal,  aux.)  ■  •  ■  " 


indefinite  pronoun 
possessive  pronoun 
possessive  adjective 
dictionary  adjective 
present  tense  adjective 
such 


■ 

n 

M 

II 

M 

■ 
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APPENDIX  X7I 


ise  Division 


Conventions:  /  means  clause  break;  +  followed  by;  (+)  followed  by,  ignoring 
adverbs  and  adjeotiveJadverbs;  /+/  followed  by,  ignoring  every¬ 
thing  except  Verbs  and  Punctuation;  /+/*  followed  by,  ignoring 
everything  except  Verbs.  NO  means  Nominal  group;  PG  nominal 
group  headed  by  a  prepositien.  V  means  Veit);  BV  "believe”  — 
type  Verb.  ^  means  that  the  item  so  marked  if  present  in  the 
specified  position  makes  the  rule  inoperative.  T  means  transi¬ 
tive,  I  intransitive.  #  means  beginning  of  sentence. 


Rule  1 


Rule  2 


Rule  2a 


Rule  4 


absolute  breaker  list  1  /+/  /,  /+/  V 

absolute  breaker  list  1  /+/  %  %|>  /+/  V 


absolute  breaker  list  1  (+)  abs,  br.  list  1  (+)  Adjective  (+) 

abs.  br.  list  1  (+)  abs.  br.  list  1  (+)  adjective  +  ^ 

/.  /+/  V  (+)  /.  (+)  V 

/,  (+)  ed/en-  V  (+)  i  NO  /+/  /,  (+)  V 


V  ^  (BV  (+)  that  +  prep)  /+/  f  adverb  +  /,  +  f  ^verb 

^  noun  (+)  /,  +  - 


^  adjective  (+)  /,  (+)  ft  adjective 


rb  +  /,  +  ft  ajiverb 

W  *  ’‘P  t  po]  ♦(2^' 


/+/  V 


Rule  6 


V  /+/  /,  NO,  /+/,  (+)  V  ft  (T-ed  (+)  ^  NO) 


Rule  8 


V  /+/•  p  T-ed  (+)  i  NO 

(noun 

T  ,<  a  object  »  /t/. 

(possessive  pronoun 


(+)  /  that  /+/*  V 
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Rule  8a  ^noun 

V  2  object  V  /+/"■  J  indefinite  pronounI 

■^reflexive  pronoun  \ 


)ui^ 


|possessive  prohounj 


Rule  9 


noun 
Indef,  pro* 
reflex,  pro,| 
poss,  pro. 


(+>  / 


(+),  +/that  /+/* 


f  article 
I  subject  pi«, 

I indef.  pro, 
iposs.  adj. 

/j.\  /  /pronoun-adjective 

noun  \  )  I  ^adjective  +  preposition 

plural  noun  (+)  /  noun 

V  4  2-object  V  /+/*  indefinite  article  +  noun  (+)  /  plural  noun 

(definite  artioleV 
subject  pro.  j 

poss,  adj.  / 


\hl*  V 


^  so  (+)  adjective  (+)  /  lixiefinite  article 


Rule  9a 


"noun 

indef,  pro,\ 
reflex,  pro,V 
poss,  pro,  J 


Article 

(+)/.  (+)  Jsub,  pro, 

“S indef,  pro, 
^ss,  adj. 


/+/- 


noun  (+)  /.  (+) 
plural  noun  (+)  /,  (+)  noun 
indef,  article  +  noun  (+)  /, J+)  plural  noun 

adj,  ft  comparative  (*)  /, 


definite  artiolS  \ 
|(+)  subj,  pro, 

[^+)  poss,  adj.  \ 


ft  so  (+)  adj,  (+)  /,  (+)  indef,  article 


Rule  10 


m 


aTixiliaiy')  (+)  BV  adj, 
copulative^ 

^pnlative 

it  (+) Jauxlllary  (+)  copulative  adj 
^happens 

^ux,  (-f)  hapoened/ing 


t/*/*  1*1-^ 


Rule  11 


any  form  of 

S®  1*1*  ^ 


-128- 


Rule  12 


”  V  /  +/*  so  (+)  ine/ed  sdj,  (+)/that  /+/*  V 

12  Cadjeotive  oomparatlvej.  /^  +  oomparativ!^/^/*  y 

'Jaors  J  (c -  ^ 


Rule  14 


Role  1! 


V  /+/*  /*A  .1u3t  as 


than  i 


/+/*  V 


as  +  to 


Rule  l6 

T  ^  copulative  (+)/NQ  /+/*  V 

SU5C.  +  I  adj.J 


Rule  17 

f who 

ffjj  (*)  W  ^ 

I  wS^e  (+)  N(n 


(+)  V  hh  /V 


Rule  18 


BV  . 

‘cSS^S?.}  W  BT-«i3.ctlr.  ij*  //♦/< 


Rule  IS 


subj,  pro 
j  article 


.  1  indof,  pro. 
noun  (+)  /  '  poss.  pro. 

pcss.  adj. 

gdj,  - 

plural  noun  (+)  noun 


W  V  i  ^1,*^ 


Rule  20 


it  (+) 


^  to  +  BV 
aux,'T+)  BV-adj. 
r copulative 

j  aux.  +  oopulatlve-adj, 
happened/lng 


/+/•  /NO  /+/•  V 


/  /+/*  V 
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Rule  21 


Prep, 


wiose 

whetner 


how 

where 


/+/*  V  /+/*  /to 


^  avuciliary 
\  copulative 

(+);  I 

)  (icax. 

|TOdal(+)teop. 


NG 

infinitive 


Rule  21a 


that  \ 
what  1 

v^se  V 

which 

wTOer 


/+/*  V  /+/*  /there  (+) 


nodal 


[  infinitive 


Rule  22 


#  +  that  +  article 
that  +  plural  noun 
w^i' 
who 
whose 

which 

whether 


now 

toere 

t^en" 

the  fact  that 
the  reason  toat 
the  reason  why 


/+/♦  V 


■k 


(+)  m 


Rule  Z' 


V-en 

V-ing 

V-ed- 


■5a  "  (^T/I 


(+)  ^  Ha 


[auxiliary 

ycopSatlve 
\  pres,  tense 
/  past  tense  only 


Rule  2^ 


Verb  /+/*  PQ  +  ^  {ScoJmtable 
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Be  (+)  iAcii.  := 


2  objeetl 


/+/*  V 


Rule  26 

V  pree.  tensel 
auxiliary  J 


f  V  pres,  tense 
/+/*  /  i  auxiliary 

1  modal 
I  copulative 
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APPEfIDIX 

Rule  1 
Rule  2 

Rijle  2a 

Rule  3 
Rule  4 
Rule  5 
Rule  6 

Rule  7 
Rule  8 
Rule  8a 

Rule  9 
Rule  10 
Rule  11 
Rule  12 

Rule  13 
Rule  14 
Rule  15 
Rule  l6 
Rule  17 
Rule  19 
Rule  19 
Rule  20 


m  (cent) 

Sample  Sentences  for  Clause  Di'^lsion  Rules 

The  whale,  though  not  a  fish,  lives  in  the  sea. 

They  published  a  story  which  though  inexact  the  secretary  released 
to  the  newsmen. 

The  story  which  though  inexact  in  all  details  was  widely  publicised 
aroused  a  great  deal  of  interest. 

The  Africans,  Livingstone  reported  earlier,  were  friendly  people„ 

The  m^,  utterly  humiliated  before  his  family,  never  returned. 

After  they  won  the  race,  the  team  travelled  to  Europe, 

Due  to  perturbation  caused  by  stars,  such  clouds,  and  the  resulting 
clusters,  probably  never  assumed  a  definite  shape. 

They  found  stains  caused  by  fire. 

They  resent  the  idea  that  in  many  cases  power  wins  over  justice. 

They  resented  the  idea,  as  a  matter  of  course,  that  power  should 
\-dn  over  Justice. 

He  said  something  nobody  would  believe. 

It  was  assumed  that  evidence  could  be  found. 

The  problem  was  that  money  was  not  appropriated. 

The  house  was  so  completely  dilapidated  that  reconstruction  was  cut 
of  the  question. 

The  more  it  rains  the  greater  the  amount  of  plant  gro;rth  will  be. 

If  Castro  is  a  genius  then  I  eat  my  hat. 

He  worked  as  hard  as  he  could. 

Wherever  he  went  a  big  crowd  turned  out  to  cheer  him. 

People  who  live  in  glass*  houses  are  nervous. 

He  began  to  think  the  Cubans  were  Insane. 

In  the  discussion  we  had  most  issues  were  settled. 

They  pretend  the  missiles  are  being  dismantled. 

I 

T 

^  IT 
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Rule  21  On  the  question  of  whether  the  motion  should  be  adopted  there  Is 
total  disagreement. 

Rule  21a  In  areas  drained  of  water  there  is  drouj^t. 

Rule  22  That  people  are  basioally  good  follows  froa  what  was  said  about  Ood. 

•  '  f"  * 

Rule  23  The  method  used  by  the  first  team  was  based  on  false  premises. 

Rule  24  The  sky  when  you  look  through  a  telesoope  seams  nearer. 

Rule  25  When  the  work  is  done  people  tend  to  relax.. 

Rule  26  The  phases  of  the  moon  whioh  influence f  our  weather  hare  been 

observed  by  farmers  for  many  eenturiea. 

,Q 

Bale  Z7m  This  seems,  even  to  modem  observers,  extremely  well  oonstruoted. 
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APPENDIX  XVIII 


AN  INTUITIVE  COMPUTER  LEARNS  AN  ELEMENT  OF  GRAMMAR* 

D,  Q,  El?.«cn 

Department  of  Psychology  Indiana  University 

Many  ap>roaohes  to  problems  of  infomation  storage  and  retrieval  and 
mechanical  translation  include  computer  programs  for  classifying  vrords  as 
parts  of  speech.  Existing  computer  programs  with  this  aim  have  the  form 
of  scientific  predictions:  classifications  are  predicted  deductively  from 
dictionary  and  context  information  by  means  of  grammatical  rules  that  repre¬ 
sent  hypothesis  or  knowledge  concezviing  regularities  in  linguistic  behavior. 

The  present  study  investigates  an  application  of  a  program  which  allows  the 
computer  to  make  classifications  intuitively  rather  than  deductively.  The 
computer  becomes  a  self-organlaing  system  whose  output  of  decisions  is 
intuitive  in  the  sense  that  it  does  not  depend  upon  (the  programmer's)  prior 
knowledge  or  guesses  oonoeming  the  relevant  anplrical  laws.  The  program  is 
based  upon  a  statistical  model *  multiple  conditional  probability,  that  is  also 
a  theoretical  model  of  human  intuitive  judgaent  and  of  learning.  The  model 
and  the  primitive  form  of  computer  in  which  it  is  programmed  for  this  stud^ 
were  described  earlier  (1,  2).  The  computer  which  utilizes  programs  of  this 
type  is  called  EMMA,  signifying  Ebiplrioal  Multivariable  Matrix  Analyzer. 

The  present  investigation  is  a  miniature,  a  methodological  experiment  to 
determine  the  feasibility  of  applying  the  EMMA  principle  to  certain  problems 
of  linguistic  analysis.  The  problem  of  classifying  words  as  parts  of  speech 
was  chosen  from  among  other  linguistic  problems  in  part  because  conventional 

* 

This  research  was  sponsored  by  Rome  Air  Development  Center,  U.S.  Air  Force, 
under  contract  No.  AF  30  (602)  -  2185  with  Indiana  University;  F.W.  Householder 
Jr.,  Principal  Investigator,  and  J.  Lyons,  Coordinator. 
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techniques  do  not  provide  a  simple  solution  to  it  and  in  part  because  its 
charaoteristios  are  well  suited  to  the  requirements  of  a  test  of  EMMA. 

There  is  a  clear  and  practical  problem  of  prediction  and,  since  the  classifi¬ 
cation  of  words  as  parts  of  speech  can  be  done  with  a  minimum  of  disagreement, 
questions  of  criterion  reliability  may  be  ignored,  Kaplan  found  (3)  that  for 
reducing  ambiguity  in  the  meaning  of  words,  a  context  consisting  in  two 
preceding  and  two  following  words  is  approximately  as  effective  as  the  full 
sentence  in  which  the  word  occurs.  This  suggests  that  it  is  not  entirely 
unrealistic  to  attempt  the  pairb-of-speech  classification  fom  the  four-word 
context  to  which  we  are  effectively  limited  in  this  study  by  the  small  data- 
handling  capacity  of  the  primitive  form  of  EMMA  that  we  chose  to  use.  If 
more  context  is  necessary  we  should  obtain  some  notion  of  the  amount  required, 

EMMA  in  this  study  consisted  in  500  marginal  punched  cards,  five 
needles,  and  pencil  and  paper,  manipulated  by  a  very  conscientious  yoiuig 
woman,  Miss  Henrietta  Chen,  The  EMMA  principle  can  be  programmed  for  faster 

i" 

operations  upon  a  larger  corpus  of  data  in  a  conventional  electronic  compu¬ 
ter,  but  in  such  computers  the  operations  that  mediate  the  predictions  are 
covert,  like  those  we  call  mental  in  man.  In  this  exploratory  study  it 
seemed  desirable  to  become  thorouj^ily  acquainted  with  the  mediating  operations 
by  watching  them  as  they  occuzred.  In  our  primitive  EMMA  most  of  the  impor¬ 
tant  ones  take  place  slowly  and  in  the  open. 

The  basic  principle  of  EMMA  is  multiple  conditional  probability,  EJvents 
are  eonsldered  as  points  in  n-classification  space  and  stored  in  a  memory 
as  computer  words  in  idiieh  a  digit-position  represents  a  classification  (a 
mutually  exclusive  and  e:diau8tive  set  of  classes)  and  a  digit  represents  a 
class  to  which  an  event  is  assigned.  If  information  concerning  one  or.  more 
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classifications  of  an  event  is  absent  or  ambiguous  (as  is  often  the  case  in 
this  study)  the  event  is  stored  as  a  computer  vford  in  which  more  than  one 
digit  occurs  in  a  given  position;  such  events  appear  in  the  memory  as  regions 
rather  than  as  points.  Given  a  partial  or  ambiguous  descrl{)tion  of  an  n- 
classificational  event,  i.e. ,  an  event  specified  by  one  or  more  classes  in 
each  of  n  -  m  classifications,  a  prediction  of  the  most  likely  class  of  the 
nth  classification  for  that  event  is  obtained  by  searching  the  memory  for 
computer  words  that  are  identical  in  the  classifications  that  are  specified, 
examining  the  associated  distribution  of  cases  in  the  nth  classil'ioation  and 
(ordinarily)  predicting  the  class  containing  the  greatest  frequency.  If  the 
number  of  identical  part-words  found  in  the  memory  is  insufficient  for  reliable 
prediction  the  number  of  eases  in  the  distribution  may  be  Increased  by  examin¬ 
ing  similar  part-words,  e.g. ,  ?«)rds  that  match  in  n  -  m  -  1  classifications. 

Method 

In  the  present  application  of  the  EtSiA  principle,  the  events  are  words 
in  a  text,  each  specified  in  the  memory  by  six  classifications,  namely  the 
part  (or  parts)  of  speech  indicated  by  a  dictionary  for  five  consecutive  words 
in  a  text  together  with  a  linguist's  classification  of  part  of  speech  for  one 
of  the  five.  The  problem  assigned  to  EMMA  was  to  predict  the  linguist's 
classification  of  a  word  (presximably  a  correct  classification),  given  the 
dictionary  classifications  of  that  word  and  four  adjoining  words,* 

Using  similar. programs  predictions  were  made  for  each  word  in  each  of 
the  five  positions  in  sets  of  five  words  so  that  it  was  possible  to  compare 
the  aeo\iracy  of  predictions  based  on  each  of  five  different  contexts,  there 


Appreciation  is  due  to  Miss  Beverly  Hung,  the  linguist  who,  by  providing 
the  correct  'Claeeifications,  contributed  to  EMMA's  training. 
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consisting  respectively  in  four  following  words,  one  word  preceding  and  three 
following,  two  words  preceding  and  tvro  following,  three  preceding  and  one 
following,  and  four  preceding.  Predictions  were  also  made  hy  combining  data 
from  two  or  more  context  configurations,  so  that  the  study  provides  information 
concerning  the  effects  of  both  amount  and  kind  of  context  on  predictive 
accuracy.  The  major  Independent  variable  in  this  study,  however,  is  the  size 
of  the  memory,  defined  as  the  number  of  stored  events  utilized  in  making  a 
predlcition  from  a  single  context.  Predictions  for  a  set  of  ll6  words  are 
made  with  memory  sizes  of  99,  199*  299»  399*  and  499  events.  The  procedure 
is  closely  analogous  to  that  of  a  human  learning  experlnNit  in  which  the  accuracy 
of  judgment  of  some  characterisitic  of  complex  situations  is  tested  after  vary¬ 
ing  amounts  of  experience  in  similar  situations  for  which  the  correct  judgment 
has  been  indicated. 

The  material  upon  which  the  program  operated  was  an  arbitrarily  selected 
text  of  some  five  hundred  consecutive  words  in  Karl  Stumpff’s  Planet  Barth 
(4,  pp.  23-25). 

The  nmv  conception,  that  the  fixed  stars  are  distant  suns,  began 
to  be  accepted  at  about  the  same  time  as  the  heliocentric  theory  of 
the  solar  system.  At  the  end  of  the  eighteenth  century  Frederick  William 
Herschel,  vAio  studied  the  fixe^  stars  with  his  giant  refleetlng  tele¬ 
scopes,  tried  to  estimate  the  size  suid  shape  of  the  stellar  system.  As 
a  basis  for  his  Investigations,  he  assumed  that  in  the  part  of  space 
occupied  by  fixed  stars  they  are  distributed  in  fairly  uniform  density. 

By  counting  all  the  stars  that  appeared  in  certain  areas  within  the 
range  of  his  most  povrerful  telescope,  he  created  a  relative  scale  for 
the  depth  of  the  stellar  systan  —  at  least  in  his  "selective  field," 

By  means  of  these  calculations  he  came  to  the  far-reaching  conclusion 
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that  the  fixed  stars  occupy  a  lens-shaped  space  of  ^eat  magnitude,  and 
that  the  borders  of  this  galazy  are  formed  by  the  Milky  Way,  itself 
made  up  of  a  great  cluster  of  extremely  remote  stars.  He  asstuaed, 
moreover,  that  our  sun  is  situated  near  the  center  of  the  Milky  Way, 
whereas  present-day  astronomers  have  reached  the  conclusion  that  the 
sun  lies  far  away  from  this  center. 

Of  course  Herschel  was  not  able  to  make  any  precise  statements 
concerning  the  true  size  and  extent  of  the  fixed  star  system.  His 
telescopes  were  very  powerful  for  the  times,  but  they  were  still  far  from 
reaching  the  faint  stars  which  are  seen  with  the  biggest  modem  telescopes. 
Furthermore,  he  had  only  a  vague  idea  of  the  distance  of  even  the  nearest 
and  bri^test  star.  If  we  assume  that  the  luminosity  of  the  stars  is 
approximately  that  of  the  sun  —  a  supposition  that  is  certainly  incorrect 
as  far  as  liviividual  stars  are  concerned,  and  can  only  be  applied  to  an 
average  of  certain  type  of  stars  —  then  we  must  conclude  that  the 
distance  of  even  the  nearest  star  is  enormous  when  measured  by  the  scale 
of  the  solar  system. 

It  was  not  until  18 37,  fifteen  years  after  Herschel >s  death,  that 
Friedrich  Wilhelm  Bessel,  an  astronomer  from  Konigsberg,  succeeded  in 
measuring  accurately  the  prespective  displacement,  or  parallsix,  of  a 
fixed  star  (Fig,  7),  Bessel  used  a  newly  invented  instrument  of  the 
greatest  precision,  the  heliometer.  It  is  now  known  that  former 
experiments  had  failed  because  th^  were  directed  at  the  brightest 
stars,  idiich  are  not  necessarily  the  nearest.  Meanwhile,  a  further 
discovery  was  made:  the  fadt  that  the> fixed  stars  are  not  fixed  but 
travel  through  space,  thou^  their  motion  is  hardly  perceptible  be¬ 
cause  they  are  so  remote.  Knowing  this,  Bessel  sought  out  a  star  that 
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was  near  enough  for  the  parallax  to  be  observable  —  and  made  his  search 
not  among  the  brightest  stars  but  among  those  which  showed  "proper 
motion."  Of  these  he  noticed  one  In  particular,  a  faint  star  (No.  6l 
in  the  constellation  Cygnus)  whose  annual  deviation  on  either  side  of 
the  central  position  is  up  to  0.33  —  in  other  words,  about  one  5,W0th 

part  of  the  angle  at  which  we  see  the  diameter  of  the  full  moon.  From 

this,  the  distance  of  6l  Cygni  could  be  calculated  as  more  than  60 
million  million  milesj  Its  light,  traveling  .  .  . 

Coding; 

Each  word  and  internal  punctuation  mark  in  the  selected  text  vjas  assigned 
one  or  more  numerals  in  the  follovdng  code  according  to  its  classification  as 

one  or  more  parts  of  speech  in  the  Thorndike  CentJta^  Senior  Dictionary.  This 

operation  provides  Dictionary  codes. 

1.  Noun  (including  pronouns) 

2.  Verb  (transitive,  intransitive  and  auxiliary) 

3.  Adjective  (including  definite  and  indefinite  aurticles) 

4.  Adverb 

5.  Preposition 

6.  Conjunction 

?.  Miscellaneous  (including  all  other  parts  of  speech  and  internal 
punctuation  but  not  the  final  period  of  a  sentence) 

In  carrying  out  the  coding  operation,  specific  questions  arose  and  in  each 
case  an  arbitrary  rule  was  formulated  and  followed  in  subsequent  cases  of  the 
same  kind.  Rules  were  restricted  to  those  that  can  be  programmed  on  a  con¬ 
ventional  computer.  These  miles  follow; 

1.  A  final  period  is  not  treated  as  a  word  (though,  since  sentences 
are  programmed  as  separate  units,  the  final  period  is  used 
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elsewhere  in  the  program), 

2.  If  a  word  containing  affixes,  e.g. ,  #stars",  does  not  appear  in 
the  dictionary,  the  stem  form  ("star")  is  used  and  the  affixes 
are  ignored,  (This  rule  obviously  results  in  the  loss  of  infor¬ 
mation  contained  in  affixes,  but  to  use  this  information  it 
would  be  necessary  to  resort  to  algorithms  and  a  deductive  pro¬ 
gram,  which  is  against  our  principles  in  this  study,  or  to 
increase  the  amovmt  and  complexity  of  the  information  used  by 
EMMA  in  her  intuitive  prQdletl6na«  which  was  not  expedient.  If 
EMMA  is  ever  used  in  practive  there  would  certainly  be  no  objection 
to  supplementing  her  intuition  with  logic  and  any  relevant  and 
available  empirical  laws.) 

3.  All  capitalized  words  not  found  in  the  dictionary  are  coded  as 
nouns, 

4.  aiainumerals  are  coded  as  noun  (code)  and  adjective  (code  3) 

since  their  word  equivalents,  e.g, ,  "two",  are  so  classified  in 
the  dictionary. 

5.  Multiple-word  technical  terms  such  as  "fixed  star"  are  treated 
as  two  single  words,  e.g,,  "fixed"  and  "star", 

6.  Hyphenated  words  are  treated  as  single  words  if  found  as  combin¬ 
ations  in  the  dictionary;  otherwise  as  two  words,  ignoring  the 
hyphen. 

Each  word  and  internal  ponetuation  mark  was  also  assigned  a  single  code 
representing  its  classification  as  a  part  of  speech  by  a  linguist  applying  her 
best  judgment  to  any  and  all  available  information.  This  operation  provides 
the  linguist’s  (or  correct)  code. 


Pmachlng 

Flye  hundred  sliigle-hole  marginal  pinched  cards  that  provided  39  holes  on 
one  side  Mere  used  for  the  memory.  One  card  vias  punched  for  each  sequence  of 
five  consecutive  words  in  a  sentence,  word  1  to  5  on  the  first  card,  2  to  6 
on  the  second  and  so  on.  Note  that  the  number  of  cards  four  less  than  the 
number  of  words  In  a  sentence,  with  the  result  that,  for  example,  the  first 
100  cards  contain  Information  concerning  ll6  words.  There  are  exactly  100 
words  in  each  of  the  five  positions.  Seven  consecutive  holes  were  assigned  to 
each  word,  each  hole  representing  one  code  number.  Holes  corresponding  to  the 
dictionary  code  (or  codes)  of  each  word  were  notched.  The  linguist's  coding 
and,  for  convenience,  the  word  Itself,  were  written  below  the  space  assigned  to 
each  word.  Additional  information,  useful  In  sorting  operations  but  not 
necessary  for  the  program  was  punched  elsewhere. 

Predictten-  Programs  and  Procedures 

The  part-of« speech  classifications  made  by  the  linguist  for  the  first  ll6 
words  in  the  text  were  predicted  under  45  conditions,  ooirespondlng  to  the  25 
combinations  of  five  positions  or  contexts  and  five  memory  slses,  together 
with  20  additional  conditions  obtained  by  combining  data  from  more  than  one  posltloifc. 
The  predictioH^  program  for  words  in  the  first  position  made  with  a  99-event 
memory  will  serve  as  an  example.  The  program  has  three  branches. 

Branch  If  the  word  has  a  single  dictlonaiy  code,  that  code  Is  the  prediction. 

Branch  2t  If  the  word  has  more  than  one  dictionary  code  and  appears  In  the 
first  position  (is  the  first  word  in  a  consecutive  set  of  five  words  in  a 
sentence),  the  memory  is  searched  for  all  other  events  which  match  that  event 
In  any  combination  of  the  dictionary  codes  for  its  five  component  words.  For 
these  events  the  distribution  of  linguist's  codes  (for  the  words  In  the  first 
position)  is  compiled.  The  predictiom  Is  the  modal  code  in  this  distribution. 

If  these  is  more  than  one  mode  there  Is,  of  course,  more  than  one  prediction. 
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i!:xainpie  or  procedure  in  Branch  2t 

The  first  word  in  the  text,  "The",  appears  in  the  first  position  on  a 
card  carrying  the  following  inforaation:  (Each  card  actually  represents 

t  I  ' 

five  events,  one.  for  each,  position,.) 


Word 

The 

new 

conception  - 

(comma) 

that 

Position 

1 

2 

'y 

4 

5 

Dictionary  codes 

3.4 

3.4 

1  ’ 

7 

1.3.4,6 

Linguist's  code 

3 

3 

1 

7 

6 

For  the  event  that 

consists 

in  the  dictionary  codes 

and  the  fipsit-positioh 

linguist's  code  there  are  l6  possible  combinations  of  the  dictionary  codes 
of  the  five  component  words,  33171.  ^3171,  3'^171,  44171,  33173,  etc.  A 
five-needle  sort  for  each  combination  drops  all  cards  representing  events 
with  a  matching  combination  of  dictionary  codes.  The  predicted  classifi¬ 
cation  for  the  word  "the"  is  the  modal  code  in  the  distribution  of  lin¬ 
guist's  codes  in  the  first  position  on  the  dropp>ed  cards.  The  card  for 
which  a  prediction  is  being  made  is  not  Included  in  the  distribution  so 
that  100  cards  provide  a  memory  of  99  events.  It  is  perhaps  obvious  that 
once  the  sort  has  been  made  it  is  economical  of  time  to  examine  the  distri¬ 
bution  of  linguist's  codes  in  the  other  positions,  l.e. ,  to  obtain  the 
second  -  'osition  prediction  for  "new",  the  third-position  prediction  for 
"conception"  and  so  on. 

Branch  3;  If  the  word  does  not  appear  in  the  first  position  (which  is  the  ease 
for  the  last  four  words  in  each  sentence,  a  total  of  l6  in  the  first  ll6  words 
of  the  text),  the  dictionary  code  or  codes  is  the  prediction. 

Similar  predictions  were  made  for  the  same  ll6  words  in  each  of  the  five 
positions  and  the  process  was  repeated  after  adding  100,  200,  300,  and  400 
events  to  the  original  memory  of  99* 


Predictions  were  scored  as  followst  for  each  word,  if  there  is  a  single 
modal  prediction  that  agrees  with  the  linguist's  coding  of  that  word,  the 
score  is  1.0,  If  there  are  two  modal  predictions  and  one  agrees,  the  score 
is  ,5,  if  three  and  one  agrees,  the  score  is  .33f  stc.  If  no  prediction  agrees 
with  the  linguist's  coding  of  that  word,  the  score  is  0,  An  average  prediction 
score  was  obtained  for  each  combination  of  position  and  memory  size. 

Additional  predictions  were  made  by  pooling  distributions  of  linguist's 
codes  for  the  same  words  in  more  than  one  position.  Combined-position  pre¬ 
dictions  were  obtained  with  each  of  the  five  memory  sizes  with  the  following 
combinations  of  positions;  1,5s  l»3.5l  1|2, 3,4,5;  and  2,3,4,  Predictions 
were  the  modal  codes  as  before,  and  were  scored  in  the  same  way  as  those  for 
single-position  predictions.  Branch  3  of  the  program  was  not  used  in  combined- 
position  predictions  whenever  at  least  one  single-position  prediction  was 
obtainable  for  all  words  by  means  of  Branches  1  and  2.  In  the  case  of  the 
ccmbination  of  positions  1  and  5,  for  example.  Branch  3  is  not  used  since 
predictions  for  the  first  four  woz*ds  in  a  sentence  are  obtained  from  position 
1  data  and  for  the  last  four  words  from  position  5  data;  computer  predictions 
(Branch  2)  are  possible  for  the  Intervening  words.  In  the  combination  of 
positions  2,  3,  and  4,  Branch  3  predictions  are  made  pnly  for  the  first  and 
last  words  of  each  sentence. 

A  third  method  of  prediction  was  tried  in  which  single-position  predictions 
rather  than  distributions  were  combined,  but  this  was  found  to  be  impractical. 

Results  and  Discussion 

The  mean  scores  for  the  five  sin^e-posltion  predictions  obtained  with 
each  memory  size  are  shown  in  Table  1  and  Figure  1.  The  negatively  accelerated 
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Tabjel 

Mean  scores  for  single-position  predictions  of  part-of- speech 
classifications  of  ll6  words 


Position 

99 

199 

Memory  Size 
299 

399 

499 

1 

.60 

.66 

.71 

.71 

.75 

2 

.69 

.74 

.79 

.79 

.80 

3 

.65 

•71 

.76 

.79 

.79 

4 

.72 

.74 

.75 

.74 

.74 

5 

.69 

.71 

.73 

.73 

.73 

Grand  Means 

.67 

"  .71 

.75 

.75 

.76 

increase  (with  sampling  deviations)  resembles  a  typical  learning  curve.  As 
mi^t  be  ejqpectedf  accuracy  cf  prediction  varies  with  the  context  used  in 
making  predictions.  Although  the  computer  improves  upon  the  predictions 
obtained  from  the  dictionary  (idiich  for  the  same  ll6  words  scores  .67),  the 
maximum  accuracy  attained,  a  score  of  .80  for  position  2,  is  obviously  too 
low  for  practical  use. 

The  primitive  fora  of  BMKA  used  in  this  study  did  not  permit  an  increase 
in  the  number  of  context  words  used  in  single-position  predictive  configur¬ 
ations  without  a  corresponding  decrease  in  the  fineness  of  their  classifi¬ 
cation.  However,  some  indication  of  the  effect  of  increasing  the  amount 
of  context  inforaation  upon  predictive  accuracy  was  obtained  by  combining 
predictions  for  words  in  two  or  acre  positions.  Results  obtained  for 
four  oooibined  predictions  are  shown  in  Table  2  and  the  grand  mean  scores 
are  shown  graj^ioally  in  Fig.  1. 
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Kaximm  aoouraoy  is  Inoreased,  to  .85  for  the  combination  prediction  from 

Mean  scores  for  combing  position  predictions  of  part-of-speech 
classifications  of  ll6  woids 


Positions  Memory  Size 


99 

199 

299 

^99 

499 

1.5 

.69 

.74 
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■^.76 
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.84 

2.3.4 

7' 
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.84 

.84 

Orand  means 

.74 

.78 

.81 

.82 

.83 

positions  1,  3t  and  5i  bat  visual  extrapolation  to  asymptote  gives  no  reason  to 
suppose  that  this  method  of  extending  the  context  used  in  predictions  will 
provide  a  usable  level  of  accuracy  for  any  memory  size.  An  alternate  is  to 
base  sinj^e-positlon  predictions  upon  larger  context  configurations.  This  will 
requirei  of  course,  a  larger  meeu>ry.  To  increase  the  number  of  context  words 
used  in  predicition  tr<m  4  to  9  fdiile  retaining  7-fold  classifications  increases 
the  number  of  possible  oonfigurations  by  a  factor  of  2401,  from  16,80?  (7^) 
to  40,353*^7  (7^)«  Because  cf  the  restrictions  imposed  by  grammatical  rules 
(assuming  that  they  represent  actual  linguistic  behavior)  the  number  of  actual 
configurations  will  increase  much  less  rapidly,  however.  For  this  and  other 
reasons  a  commensurate  increase  in  memory  size  will  not  be  irequired. 

Althouj^  the  maximum  predictive  accuracy  attained  with  the  limited  context 
infomatlon  used  in  this  study  is  not  sufficient  for  practical  purposes,  the 
feasibility  of  the  method  is  demonstrated  in  princlide.  The  practical  question 
remains,  hbiMmi;  whether  aj^ication  to  the  EMMA  principle  on  a  larger  scale 
and  with  modifications  of  detail  will  provide  predictions  of  sufficient  accuracy. 
It  is  to  be  expected  that  predictive  accuracy  will  he  related  not  only  to  the 
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amount  and  detail  ef  information  used  in  making  predlotlons,  but  also  to  its 
eharaoter  For  exanqple,  it  is  ebnrious  that  predletive  aeeuraey  viould  be 

inqproved  by  use  of  the  affix  infonaation  that  was  ignored  in  the  present  study* 

Ibis  information  can  bo  ineorporated  by  means  ef  oomrontional  deductive  teehni- 

ques  as  suggested  above,  or  more  siapOLy  by  ap^ying  the  EMMA  prlnelple  as  in  the 

present  study  but  utilizing  a  more  detailed  dictionary* 

Summary 

A  oonputer  program  based  on  eenditlenal  probabilities  is  used  to  predict 
the  dLasslfieatlon  of  trerds  in  text  as  parts  ef  speech  from  aidbiguous  dictionary 
dLasslfioatiens  of  these  words  and  a  context  of  four  adjoining  words* 

Predictions  are  based,  net  on  enplrical  laws  (in  this  ease  grammatical  rules) 
but  on  examination  ef  regularities  in  enpirical  events  (linguistic  behavior  ef 
a  writer)  represented  in  detail  in  memexy*  Accuracy  of  prediction  increases 
as  the  size  of  the  memory  increases  and  varies  also  with  the  character  of  the 
context  information  from,  whioh  the  prediction  is  made,  but  the  maximum  accuracy 
obtained  in  this  study  was  not  sufficiently  high  for  the  requirements  ef 
practical  use*  It  is  proposed  to  examine  predictions  based  on  larger  and  more 
detailed  contexts  (5)* 
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APPEiNDII  nx  —  Concerning  the  Matching  Foraolae 


First,  having  2  clusters  in  cossnon  seems  more  than  twice  as  significant 
as  having  1  In  oommont  likewise  for  3  over  2.  For  instance,  If  »  5t 
"b  ^'ab  "  ^  words  are  more  nearly  synonymous  than  If  n^  »  5t  =  5, 
n^^  a  1.  So  the  original  contribution  does  not  seem  to  give  us  a  satisfactory 
ordering  for  our  word  pairs.  At  least  as  long  as  we  use  more  or  less  noimal 
thesauri,  taking  the  n^^^  root  seems  to  be  the  simplest  modification,  giving 
us  the  word  pair  ordering  which  we  think  we  want.  This  modification  does 
not  change  the  rankings  for  fixed  n^^  -  ex,  n^  =  5,  n^  *  10  still  ranks 
even  with  n^7,  n^^  “  7  for  each  oorre spending  value  of  nab.  The  effect  of 
the  new  fommla  Is  to  raise  and  flatten  the  graph  for  n^  »  2  —more  so 
for  n^^  «  3,  Thus  s,  for  n^^  =  3.  n^  =  5,  n^  =  5,  Is  4  times  as  large  as  for 
n^  =  ll  for  “  7  =  njj,  n^^^  »  2  s  is  3  times  as  large  as  for  n^  =  1 
and  for  n^  =  3  Is  5  times  as  large. 

Secondly,  the  subject-predicate  division  may  be  Inadequate,  Predicate 
adjectives  and  objects  ssem  to  be  at  least  as  Important  as  vez^s  and  more 
Important  than  adverbs.  This  suggests  a  division  of  the  sentence  Into  S,  V 
and  P-0  with  adverbs  designated  V^,  Unfortunately  there  is  no  simple 
geonetric  schema,  for  this  since  a  line  cannot  have  more  than  two  sides  to 
its  origin.  Nevertheless,  with  the  proper  equivalences,  the  same  table  as 
before  will  suffice.  Another  question:  adjective  modifying  subject  Is 
often  Interchangeable  with  verb  or  object  -  should  It  always  receive  a  lower 
rating? 

Thirdly,  we  have  a  FLEX  rating  f  and  semantic  rating  s  giving  us  a 
rating  for  syntactic  structure  similarity  and  semantic  nearness.  We  have 
no  rating  for  relative  Information  content.  Supply,  component,  plant,  etc, 
receive  a  smnantlc  proximity  rating  of  1  for  a  pjrfect  match  and  a  FLEX 
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rating  of  1  if  one  of  them  Is  a  subject  in  both  question  and  text  sentence; 
yet  diet  matches  nutrition  with  only  a  *58  eemantie  rating  and  even  on  a 
to  P2  flexing  get  rated  only  a  .63  for  a  walue  In  the  sentenoe  of  only 
*37  oompared  with  the  words  of  low  Information  oount  tdileh  may  oontrlbute  1. 
So,  we  are  experimenting  with  rating  words  on  infomatlon  oontant  o*  Some 
suooess  has  been  aohleved  with  the  following  oontent  formula 


0  a 


n.  +  1 

a 


whloh  is  just  between  n^^  and  nj^  »  1,  the  alnianm  number  of 

semantic  cluster  memberships.  (Thus  It  Is  xMver  greater  than  unity*) 

Qlven  a  word  a^  In  the  question  sentence  and  a  word  ^  In  the  text 
we  would  compute  s^,  ^  d  f^  and  then  the  total  rating  of  the  word  as 

to  Its  Imoortanoe  or  information  ▼alue  ▼  relative  to  the  question  would  be: 

’«  ■  Vj=ijV 

Clearly,  the  new  formula  does  not  alter  the  relative  value  v  of  words, 
for  fixed  n^^,  n^  -  etc*  a  perfect  match  for  n^  ■  3»  ■  3  la  still  3  times 

what  It  Is  for  n^^  »  1,  aivl  n^^  =  2  Is  still  2f  times  as  much*  However, 


"ab  =  «a  “  2,  n^  »  3  gives  the  same  value  as  n^  ■  1,  n^  •  1,  »  5, 


and 


n^^^  =  3.  =  7.  nj^  =  7  as  far  as  relative  inf oimatlen  value  Is  concerned* 

Fouz^hly,  the  denominator  Is  the  formula  for  tot^  match.  l.e*  the 
geometric  mean  of  the  number  of  words  In  the  sentenoe,  is  so  unrelated  to 
our  manner  of  rating  that  It  may  be  possible  for  a  sentenoe  to  be  rated 
above  1*  Vforse,  the  explicit  answer  to  the  question  may  easily  be  a  20 
word,  unbroken  sentenoe  with  only  a  few  words  corresponding  to  the  ques* 
tlon*  We  are  currently  experimenting  with  the  following  kind  of  formulae 
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and 


T  » 


r/ia  -•'f.Fw 


where  is  the  computation  of  the  perfect  matches  in  the  question  with 

itself  and  Z  max  (v.  .)  is  to  be  interpireted  as  the  maximum  of  the  perfect 

j  -ivi'i. 

flex  matches  of  the  question. 

We  are  well  aware  that  accurate  evaluations  of  our  corpus  sentences 
with  respect  to  the  question  is  probably  not  feasible  -  in  fact  we  almost 
assume  not.  We  want  to  maximize  the  output  ratio 

as  well  as  the  ratio  ^gSt^n'S^'S^s*  important  that  the 

1st  ratio  be  very  high  at  the  beginning  of  the  output  and  that  it  be  low 
only  when  the  questioner  probably  has  sufficient  information.  Our  experience 
so  far  indicates  that  we  often  have  more  than  one  chance  to  retrieve  a  given 
infoi^iidtion  paragraph  .  especially  if  one  of  the  topic  sentences  is  vaguely 
worded.  Also  high  Information  sentences  may  oe  more  specific  in  their  language 
than  low  information  sentences.  It  is  also  possible  to  put  questions  in  more 
than  one  fomat. 

Even  with  the  above  suggestions  and  even  if  th^  work,  we  run  into  big 
junk  troubles.  When  a  person  asks  about  navigation  on  a  particular  river, 
how  can  we  avoid  giving  him  information  on  eaoh  of  the  world's  rivers!  Or 
if  he  wishes  to  know  about  a  certain  kind  of  lens,  how  to  avoid  giving  him 
everything!  Pronoun  use  may  really  add  to  the  difficulty  here. 

Especially,  though,  when  the  questioner  is  so  specific  as  to  use  a  name, 
it  seems  that  we  should  not  even  evaluate  the  sentences  of  an  article  which 
neither  use  the  name  nor  a  match  with  the  name,  (if  interested  in  a  certain 
astronaut,  his  ship  would  be  a  match). 
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A  further  suggestion  is  this:  (assuming  the  questions  phrased  without 
excess  words):  find  the  best  semantic  match  for  each  word  In  the  question 
sentence  which  exists  In  the  article  under  consideration;  then  multiply 
every  sentence  valuation  by  a  function  of  those  (semantlcT)  matches  -  l.e. , 
by  a  measure  of  the  maximum  probable  relevance  of  the  article  as  a  whole  to 
the  question. 

This  Is  another  advantage  of  having  the  articles  or  chapters  Indexed 
by  cluster  number  rather  than  having  sentence  numbers  so  Indexed. 

This  example  Illustrates  the  sensitivity  of  the  sentence  ranking  to  the 
form  of  the  denominator  which  Is  chosen. 


R.  V.  Cook 
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Q*  Qroups  of  scholars  currently  Morking  at  Rand  are  carrying  out  a  study 
of  Russian  derivational  morphology. 

I,  Rand  Corporation  is  studying  Russian  from  the  viewpoint  of  word-formation 
processes. 

N.  k  group  study  of  Russia  has  been  brought  to  the  attention  of  scholars. 
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